BLACK SKIES ARCHITECTURE SERIES · PART 6 OF 8 · 20 MIN

Motion Across Boundaries

This post is part of the Black Skies architecture series; the series hub is the thesis post. The lifecycle post covered how tiles are born, fail over, and die; the ownership post settled who runs one when processors disagree. What remains is the hardest thing tiles do while alive: letting things move between them. A boundary crossing is the one moment an entity's authority must travel between two CP islands across the AP layer that separates them, and a thousand ships arriving at once is that same moment at fleet scale. At 2 Hz, the budget for all of it is counted in ticks.

Boundary crossing is the hardest ordinary event

A ship flying across a tile boundary is the most routine thing in the game and the most dangerous moment in the architecture, because it is the one time an entity's authority must move between two CP islands across the AP layer between them. Get it wrong in one direction and the ship exists twice. Get it wrong in the other and it exists nowhere.

The correctness model is a five-phase state machine: prepare, accept, commit_departure, finalize_arrival, reconcile. The invariant it protects: at most one tile is ever authoritative for the ship, and at every instant at least one committed stream can reconstruct it. The pending arrival can be discarded at any point without inventing or destroying anything.

This failure has a famous name. Second Life's region crossings spent nearly two decades as the canonical seam disaster: avatars and vehicles stalling at borders, attachments lost and sessions dropped mid-handoff, and in the worst case no-copy items duplicated by crashing a simulator mid-transfer, because custody moved between simulators without a durable interlock recording which side owned what. Everything below is an answer to that record: the reservation is committed before the grant exists, the departure is committed before anyone is told it happened, and a crash at any instant leaves at least one stream that can reconstruct the ship. The two-beat protocol is Second Life's hardest lesson, applied on purpose.

Operationally, the five phases collapse into two beats. Admission runs inside the destination's current tick: validate both epochs, arbitrate against local movement and other inbound proposals, resolve the ship's remaining path against live destination state, and reserve its final cell or reject. All of that happens in the destination master's resolution pass, in process, against the live grid, where single-threaded authority makes the arbitration atomic for free; Redis never sees a proposal, and no validation logic lives anywhere near a Lua script. The reservation is simply one more event in that tick's batch, riding the normal one-write-per-tick commit through the same function the commit path post described. Settlement is the next tick's ordinary work: the source commits the departure, the destination materializes the ship into its plan, and both ride the normal stream commits. Prepare and accept became one exchange, the commit phases became regular ticks, and reconcile became lazy cleanup. A crossing completes in a deterministic two ticks, and what travels over the wire is two messages on the neighbor stream the tiles already share. Combat gets one sentence about the in-between: damage resolves wherever the ship's authoritative record sits that tick, at the source until the departure commits, at the destination from that commit onward with the reservation standing in as the target until materialization, so no window exists where a ship is unhittable or hittable twice. A ship destroyed before its departure commits simply never departs, and the reservation expires unused.

TICK N · ADMISSION TICK N+1 · SETTLEMENT SOURCE MASTER DEST MASTER COMMITTED STREAMS REGISTRY · DynamoDB propose(transferId, both epochs, path) validate epochs · arbitrate · resolve path reserve final cell in this tick's batch reservation COMMITTED (dest stream) for the next two ticks, this row IS the ship grant sent only AFTER the commit (output rule) commit departure (replica-acked: WAIT) materialize into the plan departure + settlement committed conditional location write, versioned: trails custody, never leads it two beats, two messages on a stream the tiles already share. at most one tile is ever authoritative, and at every instant at least one committed stream can reconstruct the ship.
The crossing end to end: admission inside the destination's tick, settlement the next tick, and the registry trailing custody. Two beats, two messages, two committed streams.

Committing something so short-lived deserves a why: for the tick or two between the source committing the departure and the destination materializing the ship, the reservation is the ship, the only committed record anywhere that it exists. A memory-only reservation plus a destination crash in that window is a vanished ship; a committed one is replayed by the follower, and the ship lands late instead of never. Like a write-ahead-log entry, it is written once, replayed at most once, and then inert: compacted away at the next snapshot, expired by its TTL if settlement never comes.

Committed here means what it means everywhere in this design: acknowledged into the asynchronously replicated stream, surviving master death through follower replay, surviving a Redis failover except for the few milliseconds of replication lag, and surviving total loss only back to the checkpoint. The reservation gets exactly the durability of the events around it, no more, and the floor under every compound failure is the registry: a ship missing where its record says it lives gets re-materialized, so the deepest a failure can cut is a rolled-back crossing, never a missing ship. The sliver itself is closable for the one event class whose loss could mint a duplicate: departure-bearing commits require replica acknowledgment, Redis WAIT, before the departure is announced, one round trip on exactly those ticks and no others, the same selective trade Postgres ships as per-transaction synchronous commit and Meta runs at fleet scale as lossless semi-sync MySQL. And the degraded mode is designed rather than discovered: when a shard is under-replicated, the seam hardens, transfers pause, movement truncates at the boundary, in-tile play continues untouched, and the player reads border interference, the thesis absorbing an infrastructure state again.

The grant reply is sent after the tick commits, so a grant in the source's hand means the reservation is already in the destination's committed stream, and a rejection is known the same tick, which keeps client prediction clean. That ordering is one instance of the single rule every cross-tile announcement obeys: nothing leaves a tile claiming a thing happened until the write recording it is replicated. The grant obeys it, whatever message tells the destination that the departure committed obeys it, and the registry write obeys it by construction. The recovery literature calls this the output commit rule, and Kafka's high watermark is the same rule built into a broker, where consumers structurally cannot read past what replicas hold.

Who dies × when they die → what happens to the ship beforeadmission reservationcommitted grantsent departurecommitted settled source master dies dest master dies either side promotes (epoch change) registry write lost never proposed:replays at source departure never committed:replays at source, TTL frees cell same: grant unused departed: destinationowns it via reservation already gone proposal unanswered:retry next tick follower replays,inherits reservation same: replay has it replay inherits +materializes (lands late) replay has ship transfers are epoch-scoped: a promotion ABORTS its side's in-flights instead of adopting them, replay makes the abort free, and the ship resolves by the source/dest rows above settlement re-emits its write on replay: idempotent under the version condition every cell converges to exactly one ship. the deepest any compound failure can cut is a rolled-back crossing, never a missing ship: the registry re-materializes strays.
Every actor that can die, against every window it can die in. Each cell names its recovery, and every recovery converges to exactly one ship.

Every proposal carries a transferId, the source's commit sequence, and both epochs, and the destination validates both before reserving, which ties admission directly to the ownership post's fencing: a transfer initiated by a stale owner, or aimed at one, dies at validation instead of smearing a ship across a split brain. Reservations are epoch-bound on both sides and expire after one to two ticks, so the failure cases resolve along clean lines. A source that dies after admission never committed the departure, so its successor replays a ship that never left, and the orphaned reservation evaporates on its TTL. A destination that dies after admission replays its own stream and inherits the committed reservation. Transfers are epoch-scoped, so a promotion aborts its side's in-flights rather than adopting them, and replay makes the abort free.

Two interactions with the lifecycle round it out. An admission proposal to a dying tile is an interaction like any other and resets its GC grace period, so the audit's no-pending-transfers condition and an inbound crossing cannot race. An admission proposal into a tile that does not exist is the spawn path, full stop, the same spiral-and-CAS birth every spawn uses.

Routing has to survive the in-between too, and naming the layers makes it clean. A ship lives in two kinds of state: local state, the owning tile's live simulation, which is the only authoritative truth while the game runs, and world state, the global registry that answers which tile holds which entity. The registry's home is DynamoDB: settlement emits a conditional write of the entity's location, tile and cell, versioned by the transfer sequence so a stale writer can never clobber a newer location, the same conditional-update primitive card exhaustion already rides. The emission is driven by the settlement event in the stream, which makes a lost write self-healing: a destination that crashes after settling re-encounters the event on replay and re-emits the write, idempotent under the version condition. The registry follows the stream, never leads it; custody moves when the departure commits, and the record trails custody rather than anticipating it.

The write is asynchronous and the index is eventually consistent by design, with the gap covered the way gaps always are here: while a transfer is in flight, the source records a forwarding entry carrying the transferId and destination, so a lookup that arrives for a departed ship follows the pointer instead of routing into the void, retained until the registry catches up, then aged out. During play the registry is a directory, never an authority; nothing consults it to decide what is true, only to decide where to ask. After a crash it gets one promotion: tiebreaker.

That promotion is the recovery model. When a tile reloads from checkpoint and replay, it reconciles against the location records, and two rules cover every case. An entity in local state whose record points to a different tile with a newer version is dropped on the spot, which kills crash duplicates structurally rather than detectively. An entity the records say should be here but is not gets re-materialized lazily, through the same consumer-probe path the lifecycle post uses to respawn dead tiles: the next time its player or NPC job pokes, the tile reads the record and places the ship as close to the recorded cell as the clearance search allows, the same placement primitive mass warp uses, because there is exactly one placement routine in this design. If best-effort placement moves the ship a cell or two, the client renders an emergency maneuver, a dodge around debris, and a recovery edge case becomes a story beat instead of a teleport.

The design enumerates nine distinct failure cases for this protocol, and the claim is that every one converges to exactly-one-owner. Here the honesty rule bites hardest: that claim is argued from the state machine, and proving it requires the complete implementation under chaos testing, which has not been done yet. Boundary crossing is the part of this architecture I trust the most on paper and the least without the chaos run.

The game absorbs the failure cases

The thesis post claimed the rules are shaped so the seams never show, and boundary failures are where that claim earns its keep. Consider a card fired at a ship that is mid-transfer when the effect resolves, or a crossing that aborts because the destination died in phase two. In both cases the action simply fizzles, and a fizzle is a legal game outcome, not an error state.

That works because of an ordering decision made well upstream: DynamoDB exhausts the card before the action ever reaches a tile. By the time an effect is in flight, the player has already spent the card, so any downstream failure, a full queue, a target that moved, a tile that crashed, resolves as a gameplay misfire rather than an infrastructure error demanding a refund flow and an apology dialog. Space is dangerous and shots miss. The infrastructure's worst moments are indistinguishable from the game's normal ones, which is the thesis doing its job.

Movement across the seam

The transfer protocol moves authority. Movement is where the seam meets the game, so it gets the full treatment, governed by one rule: every arrival is destination-admitted. An inbound ship is never pushed across the seam by its source. It is proposed, and the destination folds the proposal into its own resolution pass, ordered by the same precedence that orders purely local movement. To the destination, a cross-boundary arrival is just another intent in the tick.

SOURCE TILE DESTINATION TILE the seam a multi-cell move does not stop at the seam: the destination resolves the FULL remaining path final cell RESERVED at admission: counts as occupancy immediately, so next tick's local mover cannot re-fight a settled fight a wall anywhere truncates the move at the last clear cell: blocked in action, not an error tick N: source proposes, destination admits + reserves tick N+1: departure commits, ship materializes one skew tick at most, once, at the boundary: inside the rhythm cooldowns already set
A multi-cell move across the seam: proposed by the source, admitted and path-resolved by the destination against live state, final cell reserved, settled next tick.

The funnel is printed, like everything else. Ships move faster than one cell per tick, but never far enough to reach past an adjacent tile, so only ships within max-move distance of an edge can propose a crossing, and the per-edge proposal ceiling is roughly edge length times max speed. The geometry half of that number is fixed: a tile's 2,401 cells put on the order of thirty cells along each of its six edges, so the ceiling is about thirty proposals per edge per tick times the movement cap.

All of one edge's proposals in one tick travel as a single message class on the neighbor stream the tiles already share, inheriting its Manifold coalescing and watermark ordering. The benchmark plan once listed transfer batching as a failure action; under this model it is the design default.

Crossing costs nothing the player can measure. A multi-cell path that reaches the seam does not stop there: admission resolves the ship's full remaining movement inside the destination's own resolution pass, exactly as if it were a local move beginning at the entry cell, validated against the destination's live state, with the final cell reserved. An earlier draft charged a toll instead, forfeiting remaining movement at the seam, and it died on a thesis violation: uniform rules read as the game, but location-specific costs let players map the partition grid by watching where their moves cut short, which is infrastructure showing through. The toll also turned out to be free to drop. Correctness never needed it, and neither did the obvious exploit: boundary ping-pong prices itself, because the skew tick penalizes the crosser symmetrically in both directions, and if the boundary-storm benchmark ever demands hysteresis on top, it can be ownership hysteresis, the authority handoff lagging a beat while the ship is already visually across, which no player can see, rather than movement hysteresis, which every player could.

Blocked paths truncate. Movement is path-validated, because blockades, screens, and body-blocking are core verbs in a hex tactics game, and a ship that hits an occupied cell stops at the last clear cell rather than fizzling the whole move. Truncation makes a blockade something to push against rather than a binary wall, and it composes with crossing cleanly: a path blocked before the seam never proposes at all, and one blocked past it truncates inside the destination at admission, against live state. The client carries the should-I-even-attempt-this check, the same pattern as cards exhausting in DynamoDB before a tile sees the action, so by the time an intent reaches a tile the intention was sincere, and a rejection means blocked in action, a gameplay event, never input noise. The server validates everything regardless; the client's check is an efficiency courtesy, not a trust boundary.

Corners graze, never settle. Where three tiles meet, a diagonal path can clip a cell in one neighbor while terminating in another. The invariant that keeps this cheap: a move ends only in its source tile or in an admitted destination cell, never in a clipped third tile. Clipped cells are pass-through only, validated against the watermarked neighbor state the source already holds, with no admission and no settlement, so nothing can be duplicated, lost, or contested there no matter how stale the watermark was. A corner move that fails anywhere fails cleanly back into the source. The fidelity gradient this produces lands exactly where the gameplay wants it: in-tile blocking is validated live and perfect, a wall anywhere along the in-destination path blocks at admission itself against current-tick state, also perfect, and only corner clips see watermark fuzz, which only matters for cells whose occupancy changed within the last 500ms. A standing blockade is static occupancy and is never stale; the entire tolerance reduces to a gap plugged or opened this tick reading as the gap still closing.

Contention resolves locally and deterministically. Two ships from different tiles converging on one edge cell are arbitrated in the destination's single resolution pass, and a granted reservation counts as occupancy from that moment, so next tick's local mover cannot re-fight a settled fight. Precedence is seeded per tick from the tile ID and tick number, deterministic for replay, fair in expectation, and not farmable the way lowest-entity-ID-wins would be. The loser truncates at the last clear cell on its side of the seam, the same rule as any blocked path, the movement equivalent of a fizzle. Chains of ships each entering the cell the next vacates serialize at the seam, one link per tick, because admission only grants cells empty or unreserved at arbitration. Rejections come back carrying the current edge-band occupancy, so a refused mover retries next tick aimed instead of blind; under convergence, rejection is navigation data. The head-on swap, two ships in adjacent cells across the boundary trading places in one tick, is the case both destinations would happily admit independently, so it must be a stated rule.

Card effects ride the same streams. A direct-targeted effect aimed across the seam is not a transfer; it is a forwarded intent. The source tile validates what it can locally, the card already spent in DynamoDB, the range legal against the watermarked neighbor state it holds, then forwards the resolved intent over the neighbor RPC stream, where it lands in the target tile's bounded ingress queue and resolves next tick against live state, fizzling if conditions changed in flight. The bound was priced in from the start: the thesis post's worst-case ingress total is derived from card hand size, cooldowns, shared edge-cell count, and maximum card range, which is to say neighbor-forwarded effects were part of the queue's true maximum before this section existed.

Combat at the seam follows the same clock. Locks established before a ship departs forward with the transfer and resolve at settlement, so crossing is never a one-tick invulnerability button; only locks attempted during the in-flight window fizzle. Within a tick, effects resolve before admissions, so an arriving ship lands in the cell as it exists after this tick's violence, one global ordering so replays cannot diverge across tiles. Entity-local timers, cooldowns and damage-over-time, serialize with the ship and pause for at most the one skew tick, so crossing neither refunds nor eats a cooldown. And the chase gets a better sentence than it had: because no ship can outrun the one-tile movement bound, a target seen through a watermark is displaced by at most one watermark times max speed, so pursuit is not aiming at stale data, it is aiming within a known error radius, and a miss inside that radius is the targeting rule the thesis post stated from the start: the system locked on, the shot fired, conditions changed before it landed.

The late step, stated plainly. Tiles tick at the same rate but are not barrier-synchronized, so a crossing lands up to one tick later than a local move would, up to 500ms, once, at the boundary, because hiding it would be the SpatialOS mistake. At 2 Hz with cooldowns measured in seconds, that sits inside the rhythm players already experience as pacing. The honest version of the invisibility claim: invisible in play, not invisible under instrumentation. A player with a stopwatch and patience could map boundaries the way determined players map any game's netcode, and the design's bar is that nothing in ordinary play ever makes them want to. The in-flight tick leaves the ship committed in neither tile's rendered state, which sounds like a rendering crisis and is not one: the 7-tile viewport means a watching client holds both streams, and interpolation covers the gap.

What the seam never requires is cross-tile consensus about motion. Admission is local and current-tick, settlement is next-tick ordinary work, precedence is deterministic, staleness is bounded by watermarks and the movement cap, and every contested outcome maps onto an outcome the game already taught. The boundary is real; the job was making it read as physics rather than infrastructure.

Mass warp is a queue, not a protocol

An earlier design treated a thousand ships warping to the same coordinates as a coordination problem deserving its own protocol: landing leases, anchor points, cells reserved in expanding rings outward from a landing site, multi-tile spanning logic. All of it is gone, deleted by one rule change: you warp to a tile, not a cell.

ARRIVAL QUEUE serial pulls, in order: a queue cannot race itself queued = still in warp (legal) pull = ordinary admission FULL you warp to a tile, not a cell each pull: a cell with n cells of clearance; reserve, commit, settle (crossing machinery) no room? redirect to the emptiest neighbor, clockwise from north the cascade is a chain of local decisions, each by the one tile holding the request the rings are physics, not throttling a fleet materializes at the periphery and fights its way in: reinforcements behave like reinforcements, and nobody wrote a staging-area system every ship already in the tile plays at full speed throughout. EVE slows everyone reactively; this shapes the arrival with rules written before anyone fired a shot.
Mass warp as a queue: the target tile pulls arrivals serially, overflow cascades to the emptiest neighbor, and a fleet materializes at the periphery and fights its way in. The rings are physics, not throttling.

A warp request joins the destination tile's arrival queue, and the ship stays in warp until pulled. The tile processes the queue serially, in order. For each arrival it looks for a free cell with at least n free cells of clearance around it, n being a tuning knob, and if one exists, the pull is an ordinary admission: reserve the cell, commit the reservation, settle next tick, the exact machinery every boundary crossing already uses. A pull settles on a deterministic activation tick rather than whenever the queue happened to reach it, which costs nothing, keeps replay bit-identical, and hands the client a known window to play the warp-in. If no cell in the tile clears the bar, the request is redirected to the arrival queue of the emptiest neighbor, ties broken clockwise from north, and the cascade repeats from there, spawning unspawned tiles through the standard birth path as it goes. The queue drains at capacity, not at a pace: every arrival with a clear placement materializes that tick, and a landing spreads across ticks only when physics forces it, because a tile holds 2,401 cells and a five-thousand-ship fleet overflows into the ring no matter how fast anything is processed. The rings are physics, not throttling.

Every safety property the old protocol engineered now falls out of structure. Reserve-then-depart is automatic, because the pull is the admission and a ship never leaves warp without a committed cell. Contention is impossible, because a serialized queue cannot race itself. Overflow needs no coordinator, because the cascade is a sequence of local decisions, each made by the one tile that currently holds the request. And the failure mode is the gentlest in the post: a ship that cannot land anywhere yet simply stays in warp, queued, a legal game state rather than an error.

Two bounds keep the intake honest, and they are where this design parts ways with the genre's compromise. Only players warp, and actions against a tile outside your own viewport are limited to one per player per second, so the worst enqueue storm the rules permit is known in advance, small on the wire, and a row in the saturation benchmark rather than an open question. And nothing here is time dilation's cousin: a queued ship is in warp, a legal game state, while every ship already in the tile plays at full speed throughout. EVE slows everyone's ongoing fight reactively when a node saturates; this design shapes an arrival with deterministic rules that were written down before anyone fired a shot.

The cascade also produces the best emergent behavior in the design, for free. A fleet warping to a famous battle fills the target tile, then the ring around it, then the ring around that: reinforcements materialize at the periphery and fight their way in, which is exactly what reinforcements should do, and nobody wrote a staging-area system. The geometry is doing the game design again.

What is proven and what is not

This is the post where honesty needs two paragraphs, because the gap is double here.

The first gap is implementation, and it is narrowing while this series is being written. The committed reservation is no longer the specified upgrade: it is merged, the transfer's durability now lives in the destination's stream, the coordinator-metadata path is demoted to a dedup index, and the prior transfer tests were migrated onto the event model rather than deleted. The seam movement rules are mid-implementation. The warp queue is still specification, and the old ring-allocator warp is what runs today. Crash-and-retry convergence for the full protocol remains flagged as open work in the build plan, which is the chaos run below by another name.

The second gap is measurement, and here are the rows from the benchmark plan that own it. Boundary storm: 500 crossings per second across 7 tiles must hold handoff p99 under 100ms, and since transfer batching is now the design default rather than the failure action, the remaining lever is ownership hysteresis, lagging the authority handoff while the ship is already visually across, chosen over movement hysteresis precisely because players cannot see it. Mass warp cold landing: a 1,000-ship fleet into an empty target must hold per-pull admission p99 under 500ms and total time-to-land within a stated budget (an open decision: the target is set by the pull rate, and at one pull per tick a 1,000-ship fleet lands in about eight and a half minutes; a faster target means multiple pulls per tick), with pull rate and the clearance knob n as the levers, the anchor-sharding failure action having been deleted along with the anchor. The nine boundary failure cases need a full chaos run with kill -9 and partitions, because converging on a whiteboard is not converging in production. And the seam semantics add five obligations of their own: contested-cell arbitration must stay deterministic under cross-tile load, because a precedence rule that wobbles is a desync generator; a destination crash between admission and settlement must always replay to exactly one reconstructible ship, which is the committed-reservation invariant under fire; crash-reload reconciliation against the location records must end with zero duplicate entities, asserted structurally under primary-plus-replica kill rather than merely observed; replica-acknowledged departure commits must hold p99 within the handoff budget, with the seam-hardening degraded mode as the failure action; and the funnel ceiling must hold against the admission budget once the movement cap fixes its number.

Each row has a failure action that revises the design rather than excusing it. That is the contract this series has kept since the thesis post: the constraints were designed first, and the numbers decide what survives.

What comes next

Everything here delivered committed truth between tiles. Getting it to ten thousand watching clients, including the moment a network blip sends five thousand of them back at once, is the delivery post's job. And the series closes with the ledger: the full benchmark table, the two SLIs that define correctness, and exactly what would falsify this design.