The AI community has been buzzing for weeks. DeepSeek V4 was originally expected mid-February alongside Gemini 3.1 Pro, but DeepSeek has been delayed yet again—a pattern we've seen with V3, R1, and others. The new release window points to March 3rd (Lantern Festival / 元宵节), timed to coincide with China's "Two Sessions" (两会) starting March 4th.
Everything below is leaked and unconfirmed. But if even half of it is true, V4 could be a watershed moment for AI.
By the Numbers
The flagship V4 model is rumored to be a massive ~1 Trillion parameter MoE (Mixture of Experts) model—significantly larger than V3's 671B-685B. Despite the larger total, active parameters per token drop to ~32B (from V3's ~37B), suggesting significant efficiency gains.
| Specification | DeepSeek V3/V3.2 | DeepSeek V4 (Leaks) |
|---|---|---|
| Total Parameters | 671B–685B MoE | ~1 Trillion MoE |
| Active Parameters | ~37B | ~32B |
| Context Window | 128K (1M since Feb '26) | 1 Million Tokens (native) |
| Multimodal | No (text only) | Yes – Text, Image, Video, Audio (native) |
| Input Price (est.) | $0.28/M tokens | ~$0.14/M tokens |
The big story: native multimodal support. V4 would reportedly handle text, images, video, and audio out of the box—no more separate vision models.
Engram: The Memory Revolution
The most intriguing leak is Engram Conditional Memory, backed by a published paper (arXiv:2601.07372, January 2026). This isn't your typical RAG setup.
Engram provides O(1) hash lookup for static knowledge directly in DRAM. Instead of spending GPU compute on retrieving known facts, the model can instantly fetch them from a dedicated memory store. The paper claims:
- 75% dynamic reasoning / 25% static lookups—the model intelligently decides what to compute vs. what to recall
- Needle-in-a-Haystack: 97% accuracy vs. 84.2% with standard architectures
This is a fundamental shift. Instead of bloating the model weights with every fact, you maintain a separate knowledge base that the model can query at near-zero cost. It's like giving the model a reference library that doesn't require it to "memorize" everything.
New Architecture Features
Beyond Engram, V4 reportedly introduces:
- Manifold-Constrained Hyper-Connections (mHC): A separate January 2026 paper addressing training stability at 1T+ parameter scales
- DSA Lightning Indexer: Builds on V3.2-Exp's DeepSeek Sparse Attention for fast preprocessing of 1M-token contexts—~50% less compute
These aren't incremental tweaks. They're architectural innovations targeting the fundamental bottlenecks of massive-scale language models.
The Hardware Gambit
Here's where things get geopolitical.
According to Reuters (Feb 25), DeepSeek deliberately denied Nvidia and AMD access to the V4 model for inference. Instead, Huawei Ascend + Cambricon chips have early access for inference optimization.
Training was reportedly done on Nvidia H800 GPUs, but inference is optimized for Chinese silicon. For the open-source community running on Nvidia GPUs, performance could be suboptimal at launch.
This is an unprecedented hardware bet for a frontier model. DeepSeek is betting big that Huawei's ecosystem will be ready.
The Price Question
If the leaked pricing holds:
| Model | Input/1M Tokens | Output/1M Tokens |
|---|---|---|
| DeepSeek V4 (est.) | ~$0.14 | ~$0.28 |
| DeepSeek V3.2 | $0.28 | $0.42 |
| Kimi K2.5 | $0.60 | $3.00 |
| Gemini 3.1 Pro | $2.00 | $12.00 |
| Claude Opus 4.6 | $5.00 | $25.00 |
If correct, V4 would be 36x cheaper than Claude Opus 4.6 on input and 89x cheaper on output. That's not incremental improvement—that's structural disruption.
Leaked Benchmarks (NOT Verified)
The infamous "83.7% SWE-bench" graphic circulating on X has been confirmed as FAKE by the Epoch AI/FrontierMath team. But the more conservative leaks tell a compelling story:
| Benchmark | V4 (Leak) | V3.2 | Claude Opus 4.6 |
|---|---|---|---|
| SWE-bench Verified | >80% | 73.1% | 80.8% |
| Needle-in-a-Haystack | 97% (Engram) | – | – |
| HumanEval (Code Gen) | ~90% | – | ~88% |
A "V4 Lite" variant (codename: "sealion-lite") also leaked—at ~200B parameters, it reportedly outperforms V3.2 and even Claude Opus 4.6 in code optimization and visual accuracy tasks.
Open Questions
- Does V4 actually generate images/videos, or does it just understand them?
- Will Nvidia GPU users get an optimized version post-launch?
- When will open-source weights be released?
- Is the pricing sustainable, or a loss-leader to capture market share?
What This Means
DeepSeek V4—if even half of these leaks prove accurate—represents a convergence of three trends:
- Efficient architectures that do more with less (MoE + Engram)
- Native multimodal without the bloat of separate models
- Aggressive pricing that makes commercial LLMs look overpriced
The question isn't whether DeepSeek will disrupt. It's how quickly the rest of the industry responds.
Sources: Financial Times, Reuters, arXiv:2601.07372, awesomeagents.ai, nxcode.io, r/DeepSeek, r/LocalLLaMA