Since the advent of C++11 writing more functional code has become easier. Functional programming patterns and ideas are powerful additions to the C++ developer’s huge toolbox. (I recently attended a great introductory talk on them by Phil Nash at the first London C++ Meetup - you can find an older recording here on YouTube.)
In this blog post I’ll briefly cover some techniques that can be used to pass functions to other functions and show their impact on the generated assembly at the end.
(If you are familiar with lambdas and higher-order functions you can skip the following paragraphs.)
First-class and higher-order functions are one of the staples of functional programming. Wikipedia concisely describes them as follows:
Higher-order functions are functions that can either take other functions as arguments or return them as results. […]
Higher-order functions are closely related to first-class functions in that higher-order functions and first-class functions both allow functions as arguments and results of other functions. The distinction between the two is subtle: “higher-order” describes a mathematical concept of functions that operate on other functions, while “first-class” is a computer science term that describes programming language entities that have no restriction on their use (thus first-class functions can appear anywhere in the program that other first-class entities like numbers can, including as arguments to other functions and as their return values).
The introduction of auto
, lambda functions, std::function
, and many other features in C++11/14/17 effectively make C++ a language where functions are treated as first-class citizens.
// Put a function in a variable.
// This function takes another function as an argument.
auto benchmark = [](auto f)
{
start_timer();
f();
cout << end_timer();
};
// Pass a function to another function.
// This function captures the surrounding environment (it's a closure).
data some_data;
benchmark([&some_data]{ some_computation(); });
When passing functions around to other functions things are however not as straightforward as they might sound. Should you use a template parameter? Should you use std::function
? A function pointer?
This article aims to compare the aforementioned function-passing techniques and to provide some guidelines. Every technique will benchmarked.
What your options are
Without introducing any additional dependency, your options for passing callable objects around are:
function_view
- a lightweight class that we’re going to implement in this article.
(If you are familiar with everything except function_view
, you can skip to its related paragraph.)
Function pointers
Function pointers are not just a relic from the past in modern C++ - they can be used to pass free functions and captureless lambdas. Member functions can be passed to other functions with member function pointers.
Compilers can aggressively optimize function pointers (at least in the same TU), resulting in a small run-time overhead. Unfortunately, not being able to pass stateful lambdas and generic callable objects is a huge limitation which prevents many functional patterns.
Template parameters
Using a function template that takes another function via a template parameter allows any callable object to be passed without any run-time overhead. That’s right: the compiler is able to completely optimize away the “indirection”.
void f(int) { }
template <typename TF>
void g(TF&& x) { }
// ...
g(f);
g([](int){ });
g([i = 0](int) mutable { });
The drawbacks of this technique are:
In very large codebases, the use of templates for callbacks may be problematic, as they require to be included in every TU they’re used in. This can often lead to long re-compilation times and binary size bloat (which could be worse than some run-time overhead!)
It’s non-trivial to constrain the callable object to a particular signature and to make the type of the expected callable obvious to the user of the function. This could become much easier in the future if concepts lite are introduced in the standard.
std::function
I see std::function
being suggested for the purpose of passing callbacks/functions way too often. std::function
is a heavyweight general-purpose polymorphic function wrapper that is meant to store and “own” a callable object. It also has an empty state that causes an exception to be thrown upon invocation.
void f(int) { }
void g(std::function<void(int)> x) { }
// ...
g(f);
g([](int){ });
g([i = 0](int) mutable { });
Again, this is a very general-purpose class that models ownership of a callable object. It introduces a significant run-time overhead and can potentially dynamically allocate. Use it sparingly!
I strongly recommend not using std::function
unless you need its general-purpose polymorphic semantics.
function_view
This is where things start to get interesting. It is possible to easily implement a lightweight non-owning generic callable object view with an overhead comparable to raw function pointers quite easily (and zero overhead when inlined). I am going to benchmark and show a C++17 implementation. (Note that the I’ve seen already the idea of a “function view” multiple times online - I don’t claim to have invented this. An example is this StackOverflow answer by Yakk. Another is “The Impossibly Fast C++ Delegates” by Sergey Ryazanov.)
void f(int) { }
void g(function_view<void(int)> x) { }
// ...
g(f);
g([](int){ });
g([i = 0](int) mutable { });
As you can see from the example above, it looks reasonably similar to std::function
- however, its semantics are way different: function_view
does not own a callable object. It merely refers to an existing one. That means that any instance of function_view
assumes the pointee callable object to be alive.
This is very often what you want when you pass a function to another one.
(The implementation of function_view
will be shown and explained at the end of the article.)
Benchmark - generated assembly
I created a small horrible Python script that compiles a small code snippet in multiple ways and counts the lines of generated x86-64 assembly (after stripping all the cruft). This is not an accurate benchmark that resembles the real world but should give you an idea of how easy/hard it is for a compiler to optimize the techniques described above.
Stateless callable objects
The first benchmarked code snippet deals with stateless callable objects. The baseline is as follows:
Here’s a table of the lines of generated assembly.
Baseline (lines of generated assembly)
O0 | O1 | O2 | O3 | Ofast | |
---|---|---|---|---|---|
g++ 6.2.1 | 6 | 3 | 3 | 3 | 3 |
clang++ 3.9.0 | 7 | 3 | 3 | 3 | 3 |
The benchmark script tests this piece of code…
…by alternating between the following definitions of f
:
void f(void (*x)(volatile int&)) { x(state); }
template <typename TF>
void f(TF&& x) { x(state); }
void f(std::function<void(volatile int&)> x) { x(state); }
void f(function_view<void(volatile int&)> x) { x(state); }
The results are not surprising.
Function pointer
O0 | O1 | O2 | O3 | Ofast | |
---|---|---|---|---|---|
g++ 6.2.1 | 42 (+600.%) | 12 (+300.%) | 6 (+100.%) | 6 (+100.%) | 6 (+100.%) |
clang++ 3.9.0 | 42 (+500.%) | 8 (+166.%) | 5 (+66.6%) | 5 (+66.6%) | 5 (+66.6%) |
Template parameter
O0 | O1 | O2 | O3 | Ofast | |
---|---|---|---|---|---|
g++ 6.2.1 | 27 (+350.%) | 3 (+0.0%) | 3 (+0.0%) | 3 (+0.0%) | 3 (+0.0%) |
clang++ 3.9.0 | 26 (+271.%) | 6 (+100.%) | 3 (+0.0%) | 3 (+0.0%) | 3 (+0.0%) |
function_view
O0 | O1 | O2 | O3 | Ofast | |
---|---|---|---|---|---|
g++ 6.2.1 | 164 (+2633%) | 13 (+333.%) | 6 (+100.%) | 6 (+100.%) | 6 (+100.%) |
clang++ 3.9.0 | 136 (+1842%) | 34 (+1033%) | 5 (+66.6%) | 5 (+66.6%) | 5 (+66.6%) |
std::function
O0 | O1 | O2 | O3 | Ofast | |
---|---|---|---|---|---|
g++ 6.2.1 | 349 (+5716%) | 34 (+1033%) | 22 (+633.%) | 22 (+633.%) | 22 (+633.%) |
clang++ 3.9.0 | 267 (+3714%) | 98 (+3166%) | 22 (+633.%) | 22 (+633.%) | 22 (+633.%) |
Pay attention to the following things:
Template parameter is effectively a zero-cost abstraction for both compilers starting from
-O2
.function_view
has the same overhead as a raw function pointer starting from-O2
, but supports any generic callable object!- The generated assembly instructions are identical for this particular code snippet.
Don’t use
std::function
unless you need its features and semantics!
Adding the inline
keyword in front of the functions changes things dramatically: only std::function
has additional overhead - +466% more assembly lines are produced compared to the baseline. The rest of the techniques become “zero-cost”.
You can play around with the code snippet on gcc.godbolt.org, in order to closely analyze the generated assembly.
Stateful callable objects
The second benchmark snippet deals with simple stateful callable objects. The baseline is as follows:
Here’s a table of the lines of generated assembly.
Baseline
O0 | O1 | O2 | O3 | Ofast | |
---|---|---|---|---|---|
g++ 6.2.1 | 10 | 7 | 7 | 7 | 7 |
clang++ 3.9.0 | 11 | 5 | 5 | 5 | 5 |
The benchmark script tests this piece of code…
…by alternating between the following definitions of f
:
template <typename TF>
void f(TF&& x) { x(state); }
void f(function_view<void(volatile int&)> x) { x(state); }
void f(std::function<void(volatile int&)> x) { x(state); }
The results are similar to the previous benchmark:
Template parameter
O0 | O1 | O2 | O3 | Ofast | |
---|---|---|---|---|---|
g++ 6.2.1 | 36 (+260.%) | 7 (+0.0%) | 7 (+0.0%) | 7 (+0.0%) | 7 (+0.0%) |
clang++ 3.9.0 | 34 (+209.%) | 12 (+140.%) | 5 (+0.0%) | 5 (+0.0%) | 5 (+0.0%) |
function_view
O0 | O1 | O2 | O3 | Ofast | |
---|---|---|---|---|---|
g++ 6.2.1 | 173 (+1630%) | 20 (+185.%) | 10 (+42.8%) | 10 (+42.8%) | 10 (+42.8%) |
clang++ 3.9.0 | 144 (+1209%) | 64 (+1180%) | 7 (+40.0%) | 7 (+40.0%) | 7 (+40.0%) |
std::function
O0 | O1 | O2 | O3 | Ofast | |
---|---|---|---|---|---|
g++ 6.2.1 | 372 (+3620%) | 45 (+542.%) | 37 (+428.%) | 37 (+428.%) | 37 (+428.%) |
clang++ 3.9.0 | 281 (+2454%) | 141 (+2720%) | 33 (+560.%) | 33 (+560.%) | 33 (+560.%) |
All the previously made observations apply here. Using inline
also makes the assembly generated by function_view
identical to the baseline/template one.
You can play around with the code snippet on gcc.godbolt.org, in order to closely analyze the generated assembly.
In order to make sure that the overhead introduced by std::function
wasn’t mostly “fixed” and actually “scaled” by adding more functions, I manually created some snippets that invoke multiple callable objects (up to 5) and plotted the results. As an example, this is the baseline code for 5 functions…
int main()
{
volatile int k = 1;
state += k;
state += k;
state += k;
state += k;
state += k;
return state;
}
…this is the code being measured…
int main()
{
volatile int k = 1;
f([&k](volatile auto& y){ y += k; });
g([&k](volatile auto& y){ y += k; });
j([&k](volatile auto& y){ y += k; });
h([&k](volatile auto& y){ y += k; });
i([&k](volatile auto& y){ y += k; });
return state;
}
…and this is the resulting plot:
Unfortunately, the overhead from std::function
seems to scale linearly with the number of function invocations - my recommendation of avoiding it unless you need all of its “power” therefore persists.
Hopefully these benchmarks on the generated assembly should clarify how easy/hard it is for a compiler to optimize the analyzed techniques. If you’re interested in invocation run-time overhead benchmarks, Dietmar Kühl created a very interesting interactive graph (which doesn’t unfortunately include function_view
).
You can find all the code and scripts for the benchmarks on GitHub.
Let’s end the article by looking at the implementation of function_view
.
Implementation of function_view
I will now provide and explain an implementation of a simplified function_view
which supports generic callable objects (but doesn’t have any syntactic sugar for member functions). We’ll also only implement the constructor and operator()
- deducing the implementation of the remaining operators from them is straightforward.
Let’s recap what we need:
A
function_view
class template that takes a signature as its only template parameter.Its constructor will take a generic callable object, whose
operator()
invocation will be type erased.An
operator()
implementation that calls the type erased pointee and retuns its result.
Here is function_view
in all its glory:
template <typename TSignature>
class function_view;
template <typename TReturn, typename... TArgs>
class function_view<TReturn(TArgs...)> final
{
private:
using signature_type = TReturn(void*, TArgs...);
void* _ptr;
TReturn (*_erased_fn)(void*, TArgs...);
public:
template <typename T, typename = std::enable_if_t<
std::is_callable<T&(TArgs...)>{} &&
!std::is_same<std::decay_t<T>, function_view>{}>>
function_view(T&& x) noexcept : _ptr{(void*)std::addressof(x)}
{
_erased_fn = [](void* ptr, TArgs... xs) -> TReturn {
return (*reinterpret_cast<std::add_pointer_t<T>>(ptr))(
std::forward<TArgs>(xs)...);
};
}
decltype(auto) operator()(TArgs... xs) const
noexcept(noexcept(_erased_fn(_ptr, std::forward<TArgs>(xs)...)))
{
return _erased_fn(_ptr, std::forward<TArgs>(xs)...);
}
};
Let’s analyze the implementation step-by-step.
Matching the signature
When we instantiate function_view<void(int)>
, the void(int)
syntax evaluates to a single function type. We however want to separately “match” the return type and the argument types, therefore partial template specialization is used:
template <typename TSignature>
class function_view;
template <typename TReturn, typename... TArgs>
class function_view<TReturn(TArgs...)> final
{
/* ... */
};
Note that TReturn
and TArgs...
match the signature’s return and argument types exactly as they are (without stripping const
or references).
Type erasure
In order to store any callable object with the signature TReturn(TArgs...)
we need to perform type erasure in the only moment where we have its type information: function_view
’s constructor. We’ll need the following fields:
A
void*
that points to the memory location containing the referenced callable object.A
TReturn(*)(void*, TArgs...)
function pointer that, given the aforementionedvoid*
pointer, calls the referenced callable object.
template <typename TReturn, typename... TArgs>
class function_view<TReturn(TArgs...)> final
{
private:
using signature_type = TReturn(void*, TArgs...);
void* _ptr;
TReturn (*_erased_fn)(void*, TArgs...);
// ...
Let’s now define a constructor template which takes a generic callable object by forwarding-reference (as we also want to be able to “view” temporary callables, such as lambda expressions).
std::enable_if
will be used to constrain the constructor.std::is_callable
will make sure that the callable object passed to the constructor matches the desired signature. (T&
is being used instead ofT
because the pointee will always be called as an lvalue.)std::is_same
andstd::decay
will be used to prevent hijacking of the copy constructor.
A captureless lambda (which is implicitly convertible to a function pointer) will be used to initialize
_erased_fn
. Its body will maintain the type information of the passed callable object.
(Many thanks to /u/tcanens on reddit for pointing out some issues with the previous version of this article.)
// ...
public:
template <typename T, typename = std::enable_if_t<
std::is_callable<T&(TArgs...)>{} &&
!std::is_same<std::decay_t<T>, function_view>{}>>
function_view(T&& x) noexcept : _ptr{(void*)std::addressof(x)}
{
_erased_fn = [](void* ptr, TArgs... xs) -> TReturn {
return (*reinterpret_cast<std::add_pointer_t<T>>(ptr))(
std::forward<TArgs>(xs)...);
};
}
// ...
Pay attention:
std::forward
is being used in an unusual context here, asTArgs...
is not a deduced argument pack, but it is required to maintain the correct value categories (here’s a motivating example on wandbox).A trailing return type is used for the lambda that initialized
_erased_fn
, as lambdas implicitly deduce their return type by value. It is not guaranteed thatTReturn
is a value though!- An alternative is using a
-> decltype(auto)
return type, which keeps cv-qualifiers and references.
- An alternative is using a
_ptr
is initialized to the address ofx
, usingstd::addressof
to prevent unexpected errors in case of overloadedoperator&
.An explicit
(void*)
cast is being used to dropconst
qualifiers and accept function pointers.reinterpret_cast
andstd::add_pointer_t
are being used to “rebuild” the original type-erased pointer. Usingstatic_cast
would not support function pointers. UsingT*
would be ill-formed whenT
is an lvalue reference. (Thanks Yakk and T.C.!)
Call operator
The last missing piece is the operator()
, which is quite trivial:
// ...
decltype(auto) operator()(TArgs... xs) const
noexcept(noexcept(_erased_fn(_ptr, std::forward<TArgs>(xs)...)))
{
return _erased_fn(_ptr, std::forward<TArgs>(xs)...);
}
};
It’s sufficient to invoke _erased_fn
with the _ptr
pointing to the (assumed alive) callable object and with the expanded std::forward<TArgs>(xs)...
argument pack.
That’s it! You can find the complete implementation on GitHub.