A single-line bug in how Gonka replays a streamed AI response was rejecting valid inference results on the live network. PR #1270, merged on 2026-06-01 by @qdanik, fixes the devshard inference validator so it counts every token an executor emits, not just the first one in each network chunk.
Some background. When a Gonka node answers an inference request, the result streams back token by token over Server-Sent Events (SSE). To check that the node actually did the work it claims, a validator replays the same prompt and compares the two token sequences. That comparison is the "enforced" path: it reconstructs the original answer and verifies the length and similarity match.
What changed
- The core fix is in
completionapi/completionresponse.go.StreamedCompletionResponse.GetEnforcedTokenspreviously read onlyLogprobs.Content[0]from each streamed chunk. But a singlechat.completion.chunkcan carry several tokens inlogprobs.contentat once. The function now iterates every entry per chunk instead of keeping just the first. See PR #1270. - A new regression test,
completionapi/enforced_multitoken_test.go, locks the behaviour in: a chunk carrying multiple tokens must contribute all of them, and the enforced-token count must equal the position count fromExtractLogits. internal/devshard/shared_runtime.godropped an unusedrewriteRequesthelper that doubledmax_tokens.BuildValidationBodyno longer rewritesmax_tokensat all. An earlier experiment in this branch that clamped the context window was reverted, keeping the change scoped to the multi-content fix.
The diff is small: 86 lines added, 107 removed, across 4 files.
Why it matters
When an executor packed two tokens into one chunk, the validator silently dropped the second. One missing token mid-stream shifts every position after it, so a single drop made the replayed answer look both too short and too different from the original. That surfaced on the production fleet as three separate rejection reasons at once: similarity_below, inflated_tokens, and different_length. All three traced back to the same dropped tokens.
The team reproduced it end to end: a chunk carrying the two tokens " parameter" and " name" produced an enforced sequence one token short and an inflated_tokens gap of 1 against the executor's own count. With the fix, honest nodes stop getting penalised for a quirk in how their output was packed into chunks, which means fewer wrongly failed validations and steadier rewards for operators running real work.