10,000 Players, No Time Dilation
This is the opening post and hub of the Black Skies architecture series. A map of the full series, with links, is at the end.
EVE Online's largest battles are legendary, and so is the asterisk attached to them. When 6,557 players converged on FWST-8 in 2020, CCP kept the servers standing by slowing the game clock to 10% of real time. NetEase's 2025 world record of 9,524 players came with stuttering and dropped skill activations. Every record-setting battle in this genre has shipped with the same fine print: it works, but not at full speed, and not without compromise. The reason is almost always the same. Gameplay wants strong consistency. Players expect their actions to resolve in order, against a coherent shared world, right now. And consistency is exactly what stops scaling when too much of it is funneled through one place.
This post is a technical argument that a true 10,000-player battle can play the same as a 100-player skirmish: no time dilation, no degraded tick rate. The trick is not a faster server. It is recognizing that consistency and availability are a design choice, not just an infrastructure one. Pure AP architectures scale effortlessly but can't enforce the rules a game depends on; pure CP gameplay enforces the rules beautifully and then falls apart under load. The way through is to confine strong consistency to small, bounded, distributed cells, stitch those cells together with an AP design, and shape the game's rules so the seams never show: discrete turns, bounded range, cooldown-gated actions. Each of those is a gameplay decision that also happens to be a scaling decision.
Choosing the tradeoffs before they choose you
Scaling past 10,000 concurrent players in a single contested space forces tradeoffs. Every system that has reached toward that scale, from EVE to NetEase's record attempt, made those tradeoffs reactively: time dilation, degraded tick rates, dropped actions, rendering compromises. The alternative is to choose the tradeoffs upfront and design them into the gameplay so they feel like core mechanics rather than infrastructure limitations.
The pattern is industry-wide, not an EVE quirk. EVE's time dilation slows the clock when a node saturates, and CCP deserves credit for shipping the compromise honestly. Albion Online protects its single shard with hard zone caps and entry queues. PlanetSide 2 proved persistent, non-instanced battle space and still topped out around 1,200 stable players per continent. Surveying the production record while designing this system surfaced a blunt fact: no shipped game publicly demonstrates a surprise mass arrival of thousands of players into one live battle without time dilation, hard caps, instancing, or severe degradation. That is the gap this architecture exists to close, and it is why the posts that follow name each mechanism next to the production failure it answers.
That is what Black Skies does. It is a real-time tactical space game on a hex grid, where players command ships and every combat action is a card play with bounded costs, bounded targets, and a cooldown measured in seconds. Each of those rules earns its place twice. The hex grid bounds spatial partitioning. The card mechanic bounds per-action cost. The cooldown timers bound action rate. Discrete cell movement replaces continuous physics. One entity per cell replaces collision detection. These are the exact tradeoffs that massive scale demands, deliberately chosen because they also make for good tactical gameplay. The card mechanic is not flavor sitting on top of the simulation. It is the simulation boundary.
The constraints are informed by how Discord, Uber, Meta, and Netflix handle their own real-time systems at scale. Their patterns come with known limits, and the design bakes those limits into the rules of the game, so what would read as a compromise in an MMO reads as a mechanic here.
A tile is a CP island
The unit of strong consistency is the tile: one H3 resolution-11 hexagon containing 7^4 = 2,401 tactical cells at resolution 15. Inside a tile there is exactly one authority. A single master plans each tick and farms the work out to pooled task executors, and the whole unit ticks at a fixed 2 Hz. Within those walls the game is fully CP: actions resolve in order, the rules are enforced against one coherent state, and every player inside sees the same battle.
The walls are the point. A tile is small enough that its worst case can be written down, and everything that follows depends on that.
Three ways state changes, all of them bounded
Only three kinds of work can change a tile's state.
Its own simulation. The tick loop resolves queued card plays and movement against the local cell grid. Cooldowns and discrete cells cap how much any one player can ask of a single tick.
Neighbor traffic. Tiles exchange boundary state with their immediate neighbors over coalesced Protobuf RPC with low-watermark propagation, the same coalescing pattern Discord's Manifold uses to keep fan-in from scaling with chattiness. A tile has a fixed number of neighbors and a fixed shared edge, so this channel has a hard ceiling by geometry.
The ingress queue. Everything else arriving from outside, player intents included, lands in one bounded queue, and the ceiling it defends is calculable from the game rules rather than guessed. Occupancy is exclusive, one entity per tactical cell, so a tile tops out at 2,401 occupants. A combat hand holds five cards and the fastest cooldown on any card is a full minute, so the absolute burst is every occupant dumping a full hand at once — just over 12,000 plays — and the sustained ceiling is five plays per occupant per minute, about 200 intents per second at maximum occupancy, an order of magnitude under one per tick per player. Neighbor-forwarded card effects add an edge-band term bounded by the same hand arithmetic on the far side of the seam. The queue bound that ships today is 4,096 pending intents, deliberately below the theoretical burst: a realistic tile never approaches full-occupancy hand-dumping, so a full queue is not load to shed. It is a critical alarm, because the only way to fill it is a bug or an attack.
The payoff is a single testable claim: a fully saturated tile has a fixed worst-case ingress, a fixed per-tick egress, and a fixed processing budget, so its capacity can be measured in isolation. It has been. A saturated tile holds its tick budget under full local load.
Egress is O(1) at the tile
The other half of the boundedness argument is what leaves. A tile performs one Redis write per tick, regardless of whether ten players are watching it or ten thousand. Spectator count never touches the simulation. Delivering that committed state to every watching client is a real problem, but it is a different problem, owned by a different layer with different failure modes, and it gets its own post.
This separation is what kills the classic death spiral, where a big battle attracts spectators, spectator load slows the simulation, and the slowdown makes the battle bigger news. In this design the simulation cannot feel its audience.
Bounded at the top, elastic at the bottom
Bounding the worst case is only half the capacity problem. A galaxy is mostly empty, and a lone mining ship in a quiet tile should not reserve the full footprint that a saturated battle needs. It doesn't, because a tile holds no footprint to begin with: its master submits each tick's work as tasks against a shared compute pool, so a quiet tile is a planner with a nearly empty plan, and cost tracks the plan with no scaling decision anyone has to make. A tile that suddenly heats up just submits a heavier plan, and the substrate that makes that cheap, and keeps warm capacity ahead of the heat, gets its own post.
Where the consistency runs out
So a tile is a CP island, and a bounded, measured one. But a single tile is not the game. The galaxy is millions of tiles, spawning, dying, and handing entities to each other, and the layer that distributes them is not CP at all. It is AP: membership is gossiped, ownership can be briefly contested, and the system has to keep running through partitions and stale information. That asymmetry is the whole game. Strong consistency where the rules are enforced, availability where the tiles are coordinated. And it is where the genuinely hard problems live. What happens when two tiles both believe they own the same patch of space at the same instant? What happens to an entity walking across a boundary at the exact moment the destination tile dies?
One example of how seriously those problems need to be taken. When two processors disagree about who owns a tile, the system needs a way to settle it without convening a quorum on the hot path. The mechanism is the epoch: a unique, monotonically increasing ID minted whenever a tile spawns or ownership changes hands. Every write carries the epoch that made it, and in any conflict the higher epoch wins while the stale owner's writes are rejected outright. Minting epochs is the textbook job for an etcd cluster, and that was the original design. But etcd is a CP system with a global scope: one Raft leader serializes every mint, for every tile in the galaxy, and writing the test plan surfaced that ceiling under exactly the burst this game produces, a wave of tile spawns arriving as a wave of mint requests. The replacement is Aurora PostgreSQL Limitless, sharded directly by tile ID. Each mint is still a strongly consistent conditional update, but the consistency domain shrinks from the whole galaxy to one tile's row. It is the article's thesis applied to its own infrastructure: keep CP, but keep it small. Nothing failed in production. The act of writing down how the system would be proven exposed where it couldn't be. That discipline, designing the constraints first and then measuring, runs through the whole series.
What is proven and what is not
The honesty matters more than the architecture, so here is the current line. Single-tile load testing at full saturation has been run. The coordination machinery, fencing, promotion, trim safety, transfer, admission, sits behind more than a thousand test methods and a compose-based integration stack, with two named gates still open before any of it is trusted in production: the same paths against a live Redis Cluster across masters, and live Aurora Limitless with EXPLAIN proof that ownership writes stay single-shard. The full AP mesh at massive scale has not been run, pending cloud credit allocation, and bot-driven browser testing is not a substitute for real users. Until that run happens, the 10,000-player claim is a design argument with tested components and measured foundations, not a benchmark report. There will be a follow-up post when the large-scale run happens, whichever way it goes.
The series
Each post stands alone, and this page is the map. Links land here as the posts publish.
- The thesis. This post: why consistency versus availability is a design choice, and how game rules become scaling decisions.
- The life of a tile. How a tile is born by spiral search and CAS, fails over in one to two ticks, talks to six neighbors over streams that double as liveness, and garbage-collects itself. There is no centralized tile manager.
- The commit path. A durable admission ledger upstream, then one Redis function per tick, a bridge, and a checkpointer keep commit, delivery, and durability from ever coupling.
- Ownership without a global coordinator. Epoch fencing, split-brain resolution, and why the routing read path never touches the database.
- Delivering one battle to 10,000 clients. Fan-out, interest management as game design, the wire format budget, and reconnect storms.
- Motion across boundaries. Admission and settlement in two ticks, the movement rules at the seam, and mass warp as a queue instead of a protocol.
- The substrate. Bounded tiles make capacity computable: leased executors and the dispatch budget, Lambda as the overflow bridge, the three-speed capacity pipeline, and Kubernetes clusters as failure domains.
- The ledger. The full benchmark table, the two SLIs that define correctness, and exactly what would falsify this design.
When the massive-scale run happens, the results post joins this list, whichever way the numbers go.