Frame Allocators
This section explains how coroutine frames are allocated and how to customize allocation for performance.
Prerequisites
-
Completed Concurrent Composition
-
Understanding of coroutine frame allocation from C++20 Coroutines Tutorial
The Timing Constraint
Coroutine frame allocation has a unique constraint: memory must be allocated before the coroutine body begins executing. The standard C++ mechanism—promise type’s operator new—is called before the promise is constructed.
This creates a challenge: how can a coroutine use a custom allocator when the allocator might be passed as a parameter, which is stored in the frame?
Thread-Local Propagation
Capy solves this with thread-local propagation:
-
Before evaluating the task argument,
run_asyncsets a thread-local allocator -
The task’s
operator newreads this thread-local allocator -
The task stores the allocator in its promise for child propagation
This is why run_async uses two-call syntax:
run_async(executor)(my_task());
// ↑ ↑
// 1. Sets 2. Task allocated
// TLS using TLS allocator
The Window
The "window" is the interval between setting the thread-local allocator and the coroutine’s first suspension point. During this window:
-
The task is allocated using the TLS allocator
-
The task captures the TLS allocator in its promise
-
Child tasks inherit the allocator
After the window closes (at the first suspension), the TLS allocator may be restored to a previous value. The task retains its captured allocator regardless.
The FrameAllocator Concept
Custom allocators must satisfy the FrameAllocator concept, which is compatible with C++ allocator requirements:
template<typename A>
concept FrameAllocator = requires {
typename A::value_type;
} && requires(A& a, std::size_t n) {
{ a.allocate(n) } -> std::same_as<typename A::value_type*>;
{ a.deallocate(std::declval<typename A::value_type*>(), n) };
};
In practice, any standard allocator works.
Using Custom Allocators
With run_async
Pass an allocator to run_async:
std::pmr::monotonic_buffer_resource resource;
std::pmr::polymorphic_allocator<std::byte> alloc(&resource);
run_async(executor, alloc)(my_task());
Or pass a memory_resource* directly:
std::pmr::monotonic_buffer_resource resource;
run_async(executor, &resource)(my_task());
Recycling Allocator
Capy provides recycling_memory_resource, a memory resource optimized for coroutine frames:
-
Maintains freelists by size class
-
Reuses recently freed blocks (cache-friendly)
-
Falls back to upstream allocator for new sizes
This allocator is used by default for thread_pool and other execution contexts.
HALO Optimization
Heap Allocation eLision Optimization (HALO) allows the compiler to allocate coroutine frames on the stack instead of the heap when:
-
The coroutine’s lifetime is provably contained in the caller’s
-
The frame size is known at compile time
-
Optimization is enabled
template<typename T = void>
struct [[nodiscard]] BOOST_CAPY_CORO_AWAIT_ELIDABLE
task
{
// ...
};
Best Practices
Use Default Allocators
For most applications, the default recycling allocator provides good performance without configuration.
Consider Memory Resources for Batched Work
When launching many short-lived tasks together, a monotonic buffer resource can be efficient:
void process_batch(std::vector<item> const& items)
{
std::array<std::byte, 64 * 1024> buffer;
std::pmr::monotonic_buffer_resource resource(
buffer.data(), buffer.size());
for (auto const& item : items)
{
run_async(executor, &resource)(process(item));
}
// All frames deallocated when resource goes out of scope
}
Reference
| Header | Description |
|---|---|
|
Frame allocator concept and utilities |
|
Default recycling allocator implementation |
You have now learned how coroutine frame allocation works and how to customize it. Continue to Lambda Coroutine Captures to learn about a critical pitfall with lambda coroutines.