Delivering One Battle to 10,000 Clients
This post is part of the Black Skies architecture series; the series hub is the thesis post. The commit path post ended on a deliberate asymmetry: a tile performs one Redis write per tick no matter who is watching, and delivering that committed state to every watching client is a different problem, owned by a different layer. This is that layer. The simulation's egress is O(1). The audience is O(n). Everything here is the machinery that turns one write into ten thousand correct screens inside 100 milliseconds.
The budget before the architecture
Delivery lives or dies by one number, so it goes first. The design target is roughly 600 bytes per client per tick, which at 10,000 clients and 2 Hz is about 12 MB/s of aggregate egress. That is a comfortable number. The uncomfortable part is how conditional it is:
| Avg delta per client | Assumption | Total egress at 10K clients |
|---|---|---|
| 600 B | Protobuf, viewport-filtered, ~5% of entities active | ~12 MB/s |
| 2 KB | 10% active, or a wider viewport | ~40 MB/s |
| 8 KB | FlatBuffers framing, or unfiltered events | ~160 MB/s |
The first row is the design target. The other two rows are what happens when one assumption slips, and the 13x spread between top and bottom is why the 600-byte figure is the most load-bearing open assumption in the entire architecture. The rest of this post is, one way or another, a defense of that first row.
Interest management is the product, not the optimization
The arithmetic only works because no client receives the battle. A client receives its tactical viewport: the tile it occupies plus the immediate neighbors, seven channels, filtered further by visibility, relevance, and event priority. Run the numbers on a 10,000-player battle spread across 7 tiles: around 5% of entities act in any given tick, roughly 500 active entities across the combat region, about 70 per tile if combat spreads evenly, more on a hot tile when it clusters. Viewport-filtered, a client's per-tick delta stays in the hundreds of bytes. Unfiltered, the same battle is the 160 MB/s row. Interest management at the relay is what makes the budget defensible, not the raw entity count.
Here is where the thesis earns another installment, because the filtering is not hidden from players behind clever netcode. It is the game's actual UI. Ships beyond tactical range render as fleet icons rather than individual entities. The viewport shows the current tile and its neighbors because that is the information a tactical decision needs, with fidelity at distance deliberately traded away. This is not bandwidth optimization presented as gameplay. The interface a player would want for commanding a ship in a massive battle and the interface the fan-out system can afford to deliver turn out to be the same interface, because both were designed against the same tile structure at the same time.
The relay earns its keep
The plumbing upstream belongs to the commit path post: committed streams, the bridge, sharded pub/sub. The relay is where this post's work happens. Each relay pod holds client WebSockets, subscribes to the tile channels its clients' viewports require, and coalesces up to seven tile channels into one frame per client per tick, applying the interest filters as it composes. The viewport itself is tracked at the gateway, which owns the command path and therefore knows where every player is; the relay consumes that tracking and applies the filters where frames are built. Tracking lives where position is known, filtering lives where bytes are spent. The model has a clean precedent in Signal's group delivery: server-side fan-out that replicates one message into N per-device queues, with tile channels here standing in for device queues.
The pressure points under a concentrated battle are exactly three: CPU on relay pods whose clients disproportionately share the same hot tiles, WebSocket egress per pod, and the hot channel itself upstream. The first two scale horizontally by adding pods and rebalancing clients. The third does not, and it gets its own section.
One small rule rounds out the relay's contract: always send something. An idle tick still carries a 2 to 4 byte heartbeat with the sequence number and ACK, about 8 bytes per second per client, which doubles as the liveness signal and the bookkeeping channel that keeps delta encoding honest. Silence is ambiguous; a tiny frame is not. Honesty requires a timestamp here: the relay running today sends Zstd-compressed JSON frames and a thirty-second heartbeat, not the per-tick binary heartbeat. The always-send-something rule is part of the wire-format work below, and it lands with the same benchmark that decides the encoding.
The wire format is a hypothesis with a benchmark attached
The 600-byte row assumes the encoding cooperates, and the format choice is stated here as the hypothesis it is. What ships today is Zstd-compressed JSON — correct, testable, and deliberately not the final format; the stream side already commits Protobuf, so the conversion point is the relay's frame serializer and nothing upstream. The expectation is that Protobuf or bit-packed custom binary beats FlatBuffers for this workload, on a structural argument: FlatBuffers models each entity update as a table carrying offsets and vtable metadata, overhead that dominates when the typical update is a few bytes with most fields absent, while Protobuf's absent-field-is-free encoding fits sparse ticks naturally. But the argument is not the proof, and the proof has a specific shape: a real combat event carries source ID, target ID, card ID, event type, cell, damage, HP result, status effects, and a sequence number, which is considerably more than the synthetic position deltas that serialization benchmarks love. The ledger's wire format row tests against the real schema with 100 active entities and a hard ceiling of 1 KB per client per tick, and a miss means schema redesign or bit-packing, not a shrug.
When the channel gets hot
Sharded pub/sub has one structural weakness this post has to own: a single tile's channel is pinned to one shard, so a galaxy-famous battle concentrates its delivery load on one Redis node, and that pressure does not scale horizontally. The escalation ladder is pre-committed rather than improvised. First, rebalance relays so no pod over-subscribes the hot channel. Second, insert a dedicated broker tier between pub/sub and the relays for that tile, the same move Discord makes for its largest servers. Third, subdivide the hot tile itself, which the bounded tile model permits because a tile's worst case is printed.
And behind the ladder stands the swappability rule, making its third appearance in the series after the etcd-to-Aurora swap and the Lambda bridge. Every layer here consumes an interface, not a technology: the relay consumes ordered per-tile event channels and does not care what publishes them. If Redis fan-out fails at a scale no ladder rung can hold, the terminal fallback is to mirror Discord's architecture in its entirety, distributed Erlang and Elixir process groups doing the fan-out, with the relay contract unchanged. That is not a plan anyone wants to execute. It is a priced exit that keeps a single vendor's ceiling from being the architecture's ceiling.
Reconnect storms
The last problem involves no failed component at all. A transit hiccup drops a few thousand WebSockets and they all come back at once; Netflix plans live events around reconnection spikes of roughly 10x steady-state for exactly this reason. Here, 5,000 reconnecting clients with 7-tile viewports is on the order of 35,000 snapshot reads hitting the coordination cluster's cold path, at the precise moment nothing else is wrong and everything could be.
Two patterns flatten it. Server-guided backoff with jitter, a 1,000ms base plus 0 to 4,000ms of randomness delivered with the disconnect, spreads the wave over five seconds instead of half of one. Singleflight snapshot fetching collapses the duplicates that remain: when many clients on one relay need the same tile snapshot, the relay fetches once and shares. With 50 relay pods, 35,000 reads collapse to roughly 350 origin fetches. The storm becomes a drizzle without any client noticing it was managed.
What makes the repair exact rather than approximate is bookkeeping the relay was doing all along: each client session carries a per-tile sequence vector, the owner epoch and tile-lifetime sequence it last delivered. On reconnect the vector is the question and the gap is the answer; the relay repairs from the coordination stream plus a bounded replay of the client's own action results, so a returning player learns what their in-flight actions became, applied or fizzled, instead of wondering. Because the sequence never resets across owner promotions, the vector stays meaningful even when the tile failed over while the client was gone.
What is proven and what is not
One tier of this layer graduated since it was drafted: the repair machinery is implemented and tested. Sequence vectors, stale-epoch filtering, bounded action-result replay, and reconnect session repair all have coverage in the relay suite, so the reconnect section above describes code, not intent. Everything with a number in it has not graduated. Three ledger rows own this layer. Hot tile fan-out: 10,000 players viewing the same 7 tiles must hold publish-to-client p99 under 100ms, or the broker tier and relay rebalance stop being contingencies. Relay reconnect storm: 5,000 reconnects with 7-tile viewports must hold snapshot p99 under 250ms, or admission control joins jitter and singleflight. Wire format validation: the real card-event schema with 100 active entities must stay under 1 KB per client per tick, or the schema gets redesigned. And the honest headline stands above all three: every egress figure in this post is a design target, not a measurement, resting on the viewport-filtered model that the benchmark exists to validate. If the 600-byte row fails, this is the post that gets the most interesting follow-up.
What comes next
This delivered committed truth to screens. What provisions the compute underneath all of it, leases instead of scheduling, warm pools, and clusters as failure domains, is the substrate post. And the series closes with the ledger: the full benchmark table, the two SLIs that separate a working fence from a broken one, and exactly what would falsify the design.