On 2026-05-27, PR #1263 merged into the gonka repo and fixed a class of false alarms in inference validation. The change, by @qdanik, forces a single vLLM flag on every chat-completions request so validators stop rejecting honest inferences on the Kimi-K2.6 model.
What changed
Gonka validators re-run a sample of inferences to confirm each executor reported the right amount of work. One check compares the number of output tokens an executor claims against the number the validator counts in the response stream. If the claim is higher, the inference is flagged as inflated and thrown out.
- The root cause sat in vLLM's reasoning parser. When Kimi-K2.6 streams a reply, special control tokens such as
<think>and</think>were dropped from the streamed events while still being counted in the token total (usage.completion_tokens). The validator, watching the stream, saw fewer tokens than the executor's count, so the strict check fired on honest work. - The fix forces
return_token_ids=trueon every upstream request. The gateway registers it withForceLiteralParameterHandlerso a client override cannot disable it. Paired with an upstream vLLM change (vllm-project/vllm#29074), every token vLLM counts now shows up in the stream, including the previously suppressed control tokens. - The validator's token budget was raised.
BuildValidationBodynow setsmax_tokensto twice the enforced-token count when that is larger than the original request, because Kimi-K2.6 validation runs were stopping early at roughly half the budget withfinish_reason=length. - Internal vLLM fields (
prompt_token_ids,token_ids,prompt_logprobs) are now stripped from client-facing responses, so the forced flag does not leak into OpenAI-compatible payloads. The PR also folds a duplicated helper into onedevshard.JSONNumericUint64.
Why it matters
A false-positive invalidation punishes a host that did nothing wrong. Before this PR, a production sweep of 25 varied prompts (multi-turn, reasoning-heavy, CJK, emoji, and tool-call-like) flagged 9 honest inferences as inflated. After the fix, that count dropped to 0, and the validator's token count matched the executor's on all 25 prompts.
For hosts running Kimi-K2.6, the result is fewer unfair invalidations and steadier validation scores. For the network, it removes a source of noise in the very data validators rely on to catch real cheating.
The change touches 15 files (+243 / -45) and ships unit coverage: 21 sub-cases for the numeric helper, 7 for the request rewrite, plus the flag-injection and field-strip tests.