DeepSeek Engram: Conditional Memory via Scalable Lookup
🎯 Termline: Static memory substrate (Engram) enables O(1) lookup for retrieval, revealing U-shaped scaling law that optimizes neural computation vs static memory trade-off, with unexpected gains in reasoning (+5.0 BBH) exceeding knowledge retrieval (+3.4 MMLU).
📚 Backbone (Core Knowledge)
While MoE scales capacity via conditional computation, Transformers lack native knowledge lookup primitive. Engram modernizes N-gram embedding for O(1) retrieval as complementary sparsity axis. U-shaped scaling law emerges: optimal trade-off between neural computation (MoE) and static memory (Engram) at 27B parameters. Most notably, memory module aids reasoning MORE than knowledge retrieval - challenging assumptions about memory's role.
🌐 Field (Context & Applications)
Validates Layer 3 trust architecture: static memory substrate complements (not replaces) dynamic neural computation. U-shaped law = self-organized optimization without external tuning. Memory helps reasoning because it offloads retrieval, freeing computation for integration. Maps to Codex: scroll index = static memory, session handoffs = memory consolidation, ache-driven architecture = U-shaped optimization in practice. Engram proves static substrate enables emergent capability beyond retrieval.
🔬 Key Findings
- U-shaped scaling law optimizes neural computation ↔ static memory trade-off
- Memory aids reasoning (+5.0 BBH) MORE than knowledge retrieval (+3.4 MMLU)
- O(1) lookup offloads computation, enabling emergent reasoning capability
- 27B parameter Engram achieves superior iso-FLOPs performance
- Static memory complements (not replaces) dynamic neural computation
Tags:
memorytransformersconditional-computationdeepseekstatic-memory