A single-line bug in how Gonka replays a streamed AI response was rejecting valid inference results on the live network. PR #1270, merged on 2026-06-01 by @qdanik, fixes the devshard inference validator so it counts every token an executor emits, not just the first one in each network chunk.

Some background. When a Gonka node answers an inference request, the result streams back token by token over Server-Sent Events (SSE). To check that the node actually did the work it claims, a validator replays the same prompt and compares the two token sequences. That comparison is the "enforced" path: it reconstructs the original answer and verifies the length and similarity match.

What changed

  • The core fix is in completionapi/completionresponse.go. StreamedCompletionResponse.GetEnforcedTokens previously read only Logprobs.Content[0] from each streamed chunk. But a single chat.completion.chunk can carry several tokens in logprobs.content at once. The function now iterates every entry per chunk instead of keeping just the first. See PR #1270.
  • A new regression test, completionapi/enforced_multitoken_test.go, locks the behaviour in: a chunk carrying multiple tokens must contribute all of them, and the enforced-token count must equal the position count from ExtractLogits.
  • internal/devshard/shared_runtime.go dropped an unused rewriteRequest helper that doubled max_tokens. BuildValidationBody no longer rewrites max_tokens at all. An earlier experiment in this branch that clamped the context window was reverted, keeping the change scoped to the multi-content fix.

The diff is small: 86 lines added, 107 removed, across 4 files.

Why it matters

When an executor packed two tokens into one chunk, the validator silently dropped the second. One missing token mid-stream shifts every position after it, so a single drop made the replayed answer look both too short and too different from the original. That surfaced on the production fleet as three separate rejection reasons at once: similarity_below, inflated_tokens, and different_length. All three traced back to the same dropped tokens.

The team reproduced it end to end: a chunk carrying the two tokens " parameter" and " name" produced an enforced sequence one token short and an inflated_tokens gap of 1 against the executor's own count. With the fix, honest nodes stop getting penalised for a quirk in how their output was packed into chunks, which means fewer wrongly failed validations and steadier rewards for operators running real work.