Section 25

Qwen3-Coder-Next

Pre-training for code at repository scale

Paper: Qwen3-Coder-Next Technical Report — Qwen Team, 2026

Qwen3-Coder-Next (Qwen Team, early 2026) is a code-specialized model, and it’s here because pre-training for code exposes a problem the general models can mostly ignore: for the most valuable coding skills, good training data barely exists in raw form and has to be constructed. Its pre-training story is about manufacturing data, plus the now-familiar efficiency recipe.

A small active footprint

Qwen3-Coder-Next is an 80-billion-parameter Mixture-of-Experts model that activates only 3 billion parameters per forward pass — an even more aggressive sparsity ratio than DeepSeek-V3, paired with a hybrid attention design. The motivation is explicit: coding agents run in tight local-development loops where latency matters, so you want a model with a giant knowledge base but a tiny active compute cost. MoE is no longer exotic; it’s the obvious tool when you want capability without inference cost.

Why code needs special data

Ordinary web text teaches a model to write plausible code, but the skills that matter for an agent — fixing a failing test, navigating a real repository, satisfying a build — require something the open web doesn’t readily provide: verifiable, executable, interaction-rich examples. You can’t learn “did this patch make the tests pass?” from static text.

Manufacturing executable training data

Qwen3-Coder-Next’s central pre-/ mid-training idea is large-scale synthesis of verifiable coding tasks paired with fully executable environments. Two pipelines stand out: building reproducible environments and tasks from real GitHub pull requests, and synthesizing fresh tasks with their own runnable test harnesses. The model then trains on signals derived from actually executing code, not just reading it. This is the synthetic-data lesson made concrete and domain-specific: the synthetic data works because it’s grounded — anchored to executable ground truth (do the tests pass?) rather than free-floating generation.

Fill-in-the-middle: editing, not just continuing

One more code-specific pre-training detail worth knowing (standard across the Qwen-Coder and other code models) is the fill-in-the-middle (FIM) objective. Ordinary causal training only teaches left-to-right continuation, but real coding is mostly editing in place — inserting a function between existing code. FIM reorders training documents so the model sees a prefix and a suffix and must generate the middle, teaching it to complete code surrounded by context on both sides. It’s a small change to how the next-token objective is presented, with a big effect on how useful the model is in an editor.

(As with Kimi, the reinforcement-learning-from-execution loop that this data feeds is post-training; we cover only the pre-training data and architecture here.) Next, the most architecturally ambitious 2026 report: DeepSeek-V4.