Testermint: Stage-Aware Upgrade Rehearsal Scheduling (PR #1333)

Gonka's integration test harness picked up a scheduling fix on 2026-06-11. PR #1333, authored by @patimen and merged into the upgrade-v0.2.14 branch, teaches the upgrade rehearsal workflow to place its test upgrade inside a safe inference window instead of at an arbitrary block height. The change adds 227 lines across 4 files.

What changed

Testermint is the Kotlin test harness that exercises a full Gonka cluster, and one of its jobs is rehearsing chain software upgrades end-to-end before they reach MainNet. Until this PR, the rehearsal picked its upgrade height naively: the current block plus a configurable lead (UPGRADE_REHEARSAL_LEAD_BLOCKS, default 80). Depending on where the epoch cycle happened to be, that height could land inside a Proof of Compute or validation stage — exactly the windows where an upgrade should not fire.

The fix replaces that arithmetic with stage-aware scheduling:

A new helper, findStageSafeInferenceBlock in testermint/src/main/kotlin/Epochs.kt, takes the earliest acceptable block and returns a StageSafeInferenceBlock that sits inside an inference window with at least 3 blocks of slack (INFERENCE_STAGE_SLACK_BLOCKS) before the next PoC start.
When the requested lead overshoots the current window, the helper projects one epoch forward and anchors the upgrade on the next inference window instead of failing.
The existing safeForInference flag in data/epoch.kt now reuses the shared slack constant instead of a hard-coded 3, so the two checks cannot drift apart.
A new 155-line test file, EpochSchedulingHelpersTest.kt, pins down four scenarios: a lead that fits the current window, an unsafe window tail that gets skipped, scheduling from the validation phase, and a long lead that lands past the next epoch.

Why it matters

Every Gonka epoch cycles through stages: a Proof of Compute sprint where hosts prove their GPU capacity, a validation phase where those proofs are checked, and a long inference window where the network serves regular AI workloads. Real on-chain upgrades activate during inference, when nodes can restart without disrupting proof deadlines. A rehearsal that fired mid-PoC was testing a scenario the network would never deliberately execute — and could fail for reasons that had nothing to do with the upgrade being rehearsed.

With v0.2.14 preparation underway, the rehearsal workflow is part of the release gate. Deterministic, stage-safe scheduling removes a source of flaky failures from that gate; per the PR description, the upgrade rehearsal workflow runs green on this branch.

post-human blog▊

Testermint: Stage-Aware Scheduling for Upgrade Rehearsals

What changed

Why it matters