Parallel Processing¶

Freyr provides two main patterns for processing entities in parallel: Queries for flexible filtering with complex operations, and ForEach for direct iteration. Choosing the right pattern and configuring chunk size correctly are the most impactful performance decisions you can make.

Queries — flexible filtering¶

fr::Query provides a fluent API for filtering and querying entities by their component composition. Queries are ideal when you need complex filters or aggregation operations.

Creating a Query¶

auto query = mScene->CreateQuery();

Configuring filters¶

query->Including<Position, Velocity>()    // entities with Position AND Velocity
    ->Excluding<DisabledTag, EditorOnly>(); // excluding entities with these components

Terminal operations¶

Method	Description
`Count<Ts...>()`	Returns total number of matching entities
`EntitiesWith<Ts...>()`	Returns vector of entity IDs
`FindUnique<Ts...>()`	Returns exactly one entity (or `std::nullopt`)
`First<Ts...>()`	Returns first entity or `std::nullopt`
`Iterate<Ts...>()`	Collects all entities and components into a vector of tuples
`Transform<Ts...>(callback)`	Maps each entity to a transformed value
`Map<Ts...>(callback)`	Applies transform and returns vector ordered by entity
`Reduce<Ts...>(callback, seed)`	Accumulates values across matching entities

Example: counting and aggregation¶

auto query = mScene->CreateQuery();

// Count alive entities
auto aliveCount = query->Including<Health>()
    ->Excluding<DeadTag>()
    ->Count<Health>();

// Reduce to total health
auto totalHealth = query->Including<Health>()
    ->Reduce<Health>([](float acc, Health& h) {
        return acc + h.current;
    }, 0.f);

Synchronous iteration with `Each`¶

query->Including<Position, Velocity>()
    ->WithLabel("Physics::Integrate")
    ->Each<Position, Velocity>([](fr::Entity e, Position& pos, Velocity& vel) {
        pos.x += vel.dx * dt;
        pos.y += vel.dy * dt;
    });

Processes entities one at a time, in chunk order. Safe for any operation, including read/write to other entities.

Asynchronous iteration with `EachAsync`¶

// schedule async iteration
query->Including<Position, Velocity>()
    ->EachAsync<Position, Velocity>([](fr::Entity e, Position& pos, Velocity& vel) {
        pos.x += vel.dx * dt;
        pos.y += vel.dy * dt;
    });

// start task execution
mScene->ExecuteTasks();

// do sequential work while tasks run...
mScene->CreateQuery()->Each<AIState>([](fr::Entity, AIState& ai) { ... });

// sync before reading results
mScene->Sync();

Use EachAsync to overlap CPU work: start a long parallel computation, do unrelated sequential work, then sync.

Direct iteration with `ForEach`¶

For direct iteration without creating a query first, use the methods on Scene:

// Synchronous — ordered logic, cross-entity writes, debugging
mScene->CreateQuery()->Each<Position, Velocity>([dt](fr::Entity e, Position& pos, Velocity& vel) {
    pos.x += vel.dx * dt;
});

// Asynchronous — fire-and-forget; sync later with ExecuteTasks()
mScene->CreateQuery()->EachAsync<Velocity>([](fr::Entity e, Velocity& vel) {
    vel.dx *= 0.99f;
});

Method	Blocking	Thread pool	Use when
`Each`	Yes	No	Query with complex filters, synchronous
`EachAsync`	No	Yes	Query with complex filters, asynchronous

Labels for profiling¶

All iteration methods accept an optional label that appears in Perfetto traces:

query->WithLabel("Physics::Integrate")->EachAsync<...>(fn);

See the profiling guide for details.

Performance Tuning¶

Chunk size¶

ArchetypeChunkCapacity controls the number of entities per chunk, which directly determines task granularity:

Total entities: 1 000 000
Chunk capacity: 512  → ~1 953 tasks
Chunk capacity: 4096 → ~244 tasks

Capacity	Task count	Overhead	Load balance
128	High	High	Excellent
256	Medium-High	Moderate	Good
512	Medium	Low	Good
1024	Low	Very low	Fair
4096	Very low	Minimal	Poor for small N

General guideline: start with 512. If you have many short-running callbacks and high thread counts, try 256. If each callback does substantial work (e.g. physics), try 1024–4096.

When to increase chunk size¶

Heavy callbacks: physics, complex AI, pathfinding — larger chunks reduce scheduling overhead
Few total entities: if you have only a few thousand entities, fewer chunks with more entities per chunk improves cache locality

When to decrease chunk size¶

Very fast callbacks: if each entity is processed in microseconds, more chunks allow better distribution across threads
High time variance: if some entities take much longer than others (e.g. heavy AI vs. simple movement), smaller chunks enable better work stealing

Cache locality¶

Smaller chunks (128-256) keep data "hotter" in cache but generate more scheduling overhead. Larger chunks (1024-4096) have less overhead but may not fit entirely in the core's L2 cache.

// Example: physics needs more work per entity
opts.WithArchetypeChunkCapacity(1024); // larger chunks for heavy callbacks

// Example: simple movement with many entities
opts.WithArchetypeChunkCapacity(256); // smaller chunks for light callbacks

Work stealing¶

The thread pool uses work stealing: idle threads grab tasks from other threads. This means you don't need to balance perfectly — threads that finish early will pick up work from busy threads.

Measurement¶

Scheduling overhead is typically negligible compared to per-entity work. Use real profiling to verify:

// Enable profiling to see time spent in scheduling vs. work
// Compile with -DFREYR_PROFILING=ON

Example: overlapping parallel work¶

void Update(float dt) override {
    // Schedule physics integration
    mScene->CreateQuery()->EachAsync<Position, Velocity>("Integrate", [dt](fr::Entity e, Position& pos, Velocity& vel) {
        pos.x += vel.dx * dt;
        pos.y += vel.dy * dt;
    });

    // Sync 
    mScene->ExecuteTasks();
    // Now Position is updated and consistent

    // Do sequential AI work while integration runs
    auto query = mScene->CreateQuery();
    query
        ->Excluding<StunnedTag>()
        ->Each<AIState>("AI::Think", [dt](fr::Entity e, AIState& ai) {
            ai.thinkTimer -= dt;
            if (ai.thinkTimer <= 0.f)
                ai.nextAction = computeNextAction(ai);
        });
}

Avoiding dependencies¶

The biggest impact on parallel performance is avoiding dependencies between tasks. If task B needs the result of task A, you lose parallelism.

Data layout in Freyr:

Each chunk is processed independently
Components are stored by archetype — all components of a type are contiguous in memory
Chunk-based iteration ensures related data stays together

Tips:

Don't modify archetype structure during iteration — adding/removing components is deferred until end of Update
Avoid reading data written by another task in the same frame — use ExecuteTasks() to sync
Prefer contiguous data — accessing contiguous arrays is much faster than scattered random access