DeepSeek V4 Speculations

The AI community has been buzzing for weeks. DeepSeek V4 was originally expected mid-February alongside Gemini 3.1 Pro, but DeepSeek has been delayed yet again—a pattern we've seen with V3, R1, and others. The new release window points to March 3rd (Lantern Festival / 元宵节), timed to coincide with China's "Two Sessions" (两会) starting March 4th.

Everything below is leaked and unconfirmed. But if even half of it is true, V4 could be a watershed moment for AI.

By the Numbers

The flagship V4 model is rumored to be a massive ~1 Trillion parameter MoE (Mixture of Experts) model—significantly larger than V3's 671B-685B. Despite the larger total, active parameters per token drop to ~32B (from V3's ~37B), suggesting significant efficiency gains.

SpecificationDeepSeek V3/V3.2DeepSeek V4 (Leaks)
Total Parameters671B–685B MoE~1 Trillion MoE
Active Parameters~37B~32B
Context Window128K (1M since Feb '26)1 Million Tokens (native)
MultimodalNo (text only)Yes – Text, Image, Video, Audio (native)
Input Price (est.)$0.28/M tokens~$0.14/M tokens

The big story: native multimodal support. V4 would reportedly handle text, images, video, and audio out of the box—no more separate vision models.

Engram: The Memory Revolution

The most intriguing leak is Engram Conditional Memory, backed by a published paper (arXiv:2601.07372, January 2026). This isn't your typical RAG setup.

Engram provides O(1) hash lookup for static knowledge directly in DRAM. Instead of spending GPU compute on retrieving known facts, the model can instantly fetch them from a dedicated memory store. The paper claims:

  • 75% dynamic reasoning / 25% static lookups—the model intelligently decides what to compute vs. what to recall
  • Needle-in-a-Haystack: 97% accuracy vs. 84.2% with standard architectures

This is a fundamental shift. Instead of bloating the model weights with every fact, you maintain a separate knowledge base that the model can query at near-zero cost. It's like giving the model a reference library that doesn't require it to "memorize" everything.

New Architecture Features

Beyond Engram, V4 reportedly introduces:

  • Manifold-Constrained Hyper-Connections (mHC): A separate January 2026 paper addressing training stability at 1T+ parameter scales
  • DSA Lightning Indexer: Builds on V3.2-Exp's DeepSeek Sparse Attention for fast preprocessing of 1M-token contexts—~50% less compute

These aren't incremental tweaks. They're architectural innovations targeting the fundamental bottlenecks of massive-scale language models.

The Hardware Gambit

Here's where things get geopolitical.

According to Reuters (Feb 25), DeepSeek deliberately denied Nvidia and AMD access to the V4 model for inference. Instead, Huawei Ascend + Cambricon chips have early access for inference optimization.

Training was reportedly done on Nvidia H800 GPUs, but inference is optimized for Chinese silicon. For the open-source community running on Nvidia GPUs, performance could be suboptimal at launch.

This is an unprecedented hardware bet for a frontier model. DeepSeek is betting big that Huawei's ecosystem will be ready.

The Price Question

If the leaked pricing holds:

ModelInput/1M TokensOutput/1M Tokens
DeepSeek V4 (est.)~$0.14~$0.28
DeepSeek V3.2$0.28$0.42
Kimi K2.5$0.60$3.00
Gemini 3.1 Pro$2.00$12.00
Claude Opus 4.6$5.00$25.00

If correct, V4 would be 36x cheaper than Claude Opus 4.6 on input and 89x cheaper on output. That's not incremental improvement—that's structural disruption.

Leaked Benchmarks (NOT Verified)

The infamous "83.7% SWE-bench" graphic circulating on X has been confirmed as FAKE by the Epoch AI/FrontierMath team. But the more conservative leaks tell a compelling story:

BenchmarkV4 (Leak)V3.2Claude Opus 4.6
SWE-bench Verified>80%73.1%80.8%
Needle-in-a-Haystack97% (Engram)
HumanEval (Code Gen)~90%~88%

A "V4 Lite" variant (codename: "sealion-lite") also leaked—at ~200B parameters, it reportedly outperforms V3.2 and even Claude Opus 4.6 in code optimization and visual accuracy tasks.

Open Questions

  • Does V4 actually generate images/videos, or does it just understand them?
  • Will Nvidia GPU users get an optimized version post-launch?
  • When will open-source weights be released?
  • Is the pricing sustainable, or a loss-leader to capture market share?

What This Means

DeepSeek V4—if even half of these leaks prove accurate—represents a convergence of three trends:

  1. Efficient architectures that do more with less (MoE + Engram)
  2. Native multimodal without the bloat of separate models
  3. Aggressive pricing that makes commercial LLMs look overpriced

The question isn't whether DeepSeek will disrupt. It's how quickly the rest of the industry responds.


Sources: Financial Times, Reuters, arXiv:2601.07372, awesomeagents.ai, nxcode.io, r/DeepSeek, r/LocalLLaMA