Frame Allocators

This section explains how coroutine frames are allocated and how to customize allocation for performance.

Prerequisites

Completed Concurrent Composition
Understanding of coroutine frame allocation from C++20 Coroutines Tutorial

The Timing Constraint

Coroutine frame allocation has a unique constraint: memory must be allocated before the coroutine body begins executing. The standard C++ mechanism—promise type’s operator new—is called before the promise is constructed.

This creates a challenge: how can a coroutine use a custom allocator when the allocator might be passed as a parameter, which is stored in the frame?

Thread-Local Propagation

Capy solves this with thread-local propagation:

Before evaluating the task argument, run_async sets a thread-local allocator
The task’s operator new reads this thread-local allocator
The task stores the allocator in its promise for child propagation

This is why run_async uses two-call syntax:

run_async(executor)(my_task());
//        ↑         ↑
//        1. Sets    2. Task allocated
//        TLS        using TLS allocator

The Window

The "window" is the interval between setting the thread-local allocator and the coroutine’s first suspension point. During this window:

The task is allocated using the TLS allocator
The task captures the TLS allocator in its promise
Child tasks inherit the allocator

After the window closes (at the first suspension), the TLS allocator may be restored to a previous value. The task retains its captured allocator regardless.

The FrameAllocator Concept

Custom allocators must satisfy the FrameAllocator concept, which is compatible with C++ allocator requirements:

template<typename A>
concept FrameAllocator = requires {
    typename A::value_type;
} && requires(A& a, std::size_t n) {
    { a.allocate(n) } -> std::same_as<typename A::value_type*>;
    { a.deallocate(std::declval<typename A::value_type*>(), n) };
};

In practice, any standard allocator works.

Using Custom Allocators

With run_async

Pass an allocator to run_async:

std::pmr::monotonic_buffer_resource resource;
std::pmr::polymorphic_allocator<std::byte> alloc(&resource);

run_async(executor, alloc)(my_task());

Or pass a memory_resource* directly:

std::pmr::monotonic_buffer_resource resource;
run_async(executor, &resource)(my_task());

Default Allocator

When no allocator is specified, run_async uses the execution context’s default frame allocator, typically a recycling allocator optimized for coroutine frame sizes.

Recycling Allocator

Capy provides recycling_memory_resource, a memory resource optimized for coroutine frames:

Maintains freelists by size class
Reuses recently freed blocks (cache-friendly)
Falls back to upstream allocator for new sizes

This allocator is used by default for thread_pool and other execution contexts.

HALO Optimization

Heap Allocation eLision Optimization (HALO) allows the compiler to allocate coroutine frames on the stack instead of the heap when:

The coroutine’s lifetime is provably contained in the caller’s
The frame size is known at compile time
Optimization is enabled

Capy’s task<T> uses the attribute (when available) to enable HALO:

template<typename T = void>
struct [[nodiscard]] BOOST_CAPY_CORO_AWAIT_ELIDABLE
    task
{
    // ...
};

When HALO Applies

HALO is most effective for immediately-awaited tasks:

// HALO can apply: task is awaited immediately
int result = co_await compute();

// HALO cannot apply: task escapes to storage
auto t = compute();
tasks.push_back(std::move(t));

Measuring HALO Effectiveness

Profile your application to see if HALO is taking effect. Look for:

Reduced heap allocations
Improved cache locality
Lower allocation latency

Best Practices

Use Default Allocators

For most applications, the default recycling allocator provides good performance without configuration.

Consider Memory Resources for Batched Work

When launching many short-lived tasks together, a monotonic buffer resource can be efficient:

void process_batch(std::vector<item> const& items)
{
    std::array<std::byte, 64 * 1024> buffer;
    std::pmr::monotonic_buffer_resource resource(
        buffer.data(), buffer.size());

    for (auto const& item : items)
    {
        run_async(executor, &resource)(process(item));
    }
    // All frames deallocated when resource goes out of scope
}

Profile Before Optimizing

Coroutine frame allocation is rarely the bottleneck. Profile your application before investing in custom allocators.

Reference

Header Description

Header	Description
`<boost/capy/ex/frame_allocator.hpp>`	Frame allocator concept and utilities
`<boost/capy/ex/recycling_memory_resource.hpp>`	Default recycling allocator implementation

<boost/capy/ex/frame_allocator.hpp>

Frame allocator concept and utilities

<boost/capy/ex/recycling_memory_resource.hpp>

Default recycling allocator implementation

You have now learned how coroutine frame allocation works and how to customize it. Continue to Lambda Coroutine Captures to learn about a critical pitfall with lambda coroutines.

Edit this Page