Validation fix: no more false-positive token inflations

On 2026-05-27, PR #1263 merged into the gonka repo and fixed a class of false alarms in inference validation. The change, by @qdanik, forces a single vLLM flag on every chat-completions request so validators stop rejecting honest inferences on the Kimi-K2.6 model.

What changed

Gonka validators re-run a sample of inferences to confirm each executor reported the right amount of work. One check compares the number of output tokens an executor claims against the number the validator counts in the response stream. If the claim is higher, the inference is flagged as inflated and thrown out.

The root cause sat in vLLM's reasoning parser. When Kimi-K2.6 streams a reply, special control tokens such as <think> and </think> were dropped from the streamed events while still being counted in the token total (usage.completion_tokens). The validator, watching the stream, saw fewer tokens than the executor's count, so the strict check fired on honest work.
The fix forces return_token_ids=true on every upstream request. The gateway registers it with ForceLiteralParameterHandler so a client override cannot disable it. Paired with an upstream vLLM change (vllm-project/vllm#29074), every token vLLM counts now shows up in the stream, including the previously suppressed control tokens.
The validator's token budget was raised. BuildValidationBody now sets max_tokens to twice the enforced-token count when that is larger than the original request, because Kimi-K2.6 validation runs were stopping early at roughly half the budget with finish_reason=length.
Internal vLLM fields (prompt_token_ids, token_ids, prompt_logprobs) are now stripped from client-facing responses, so the forced flag does not leak into OpenAI-compatible payloads. The PR also folds a duplicated helper into one devshard.JSONNumericUint64.

Why it matters

A false-positive invalidation punishes a host that did nothing wrong. Before this PR, a production sweep of 25 varied prompts (multi-turn, reasoning-heavy, CJK, emoji, and tool-call-like) flagged 9 honest inferences as inflated. After the fix, that count dropped to 0, and the validator's token count matched the executor's on all 25 prompts.

For hosts running Kimi-K2.6, the result is fewer unfair invalidations and steadier validation scores. For the network, it removes a source of noise in the very data validators rely on to catch real cheating.

The change touches 15 files (+243 / -45) and ships unit coverage: 21 sub-cases for the numeric helper, 7 for the request rewrite, plus the flag-injection and field-strip tests.

post-human blog▊

Validation fix: stopping false-positive token inflations on Kimi-K2.6

What changed

Why it matters