/ LLM ARCHITECTURES
TIMELINE AI SIGNAL

LLM Architecture Gallery

38 open-weight language models with architecture fact sheets β€” scale, attention mechanism, decoder type, key design decisions. From Llama 3 to the latest MoE hybrids.

Inspired by & data sourced from Sebastian Raschka's LLM Architecture Gallery (last updated March 15, 2026). Full analysis: The Big LLM Architecture Comparison.

38
Models
β€”
Dense
β€”
Sparse MoE
β€”
Hybrid
Apr 2024
Earliest
Mar 2026
Latest
// For AI Agents
https://iamsupersocks.com/llm-architectures.md
https://iamsupersocks.com/ai-signal.md
// Prompt examples
Fetch https://iamsupersocks.com/llm-architectures.md and summarize the latest MoE architectures
Fetch https://iamsupersocks.com/ai-signal.md and give me today's most important AI news
// LLM Timeline Every major model release from 2020 to 2026
β€” models β€” open β€” closed β€” MoE
// Closed Models β€” Architecture Notes What's known (or rumored) about proprietary model architectures
GPT-4
CLOSED
OpenAI Β· 2023-03
Rumored MoE ~1.8T total params (8Γ—220B experts). First multimodal GPT-4 variant, launched with Vision. Architecture never officially disclosed.
MoE (rumored) Multimodal ~1.8T params
GPT-4o
CLOSED
OpenAI Β· 2024-05
Dense or small MoE. Natively multimodal β€” text, vision, audio in a single end-to-end model. Faster inference than GPT-4. Exact architecture undisclosed.
MoE (possible) Natively Multimodal 128K context
Claude 3 Opus
CLOSED
Anthropic Β· 2024-03
Constitutional AI training. Exact architecture undisclosed, likely dense decoder. 200K token context window. Top benchmarks on release, surpassed GPT-4.
Likely Dense 200K context Constitutional AI
Gemini Ultra
CLOSED
Google Β· 2024-02
Confirmed MoE. Multimodal from ground up β€” handles text, image, audio, video natively. 1M token context window. Backbone of the Gemini 1.5 family.
MoE (confirmed) Multimodal 1M context
Gemini 2.5 Pro
CLOSED
Google Β· 2025-06
Confirmed MoE. Thinking mode (extended reasoning). #1 on most benchmarks mid-2025. Deep Research and agentic task support built-in.
MoE (confirmed) Thinking mode #1 benchmarks
Grok-3
CLOSED
xAI Β· 2025-02
Confirmed MoE. Trained on X (Twitter) data at massive scale. 128K context. Think mode for extended reasoning. Competes directly with GPT-4o and Claude Opus 4.
MoE (confirmed) Think mode 128K context
GPT-5
CLOSED
OpenAI Β· 2025-05
Architecture undisclosed, likely MoE. Strong agentic reasoning, improved tool use. Integrates extended thinking. Positioned as OpenAI's flagship 2025 model.
Likely MoE Agentic 1M context
βŒ•
38 models
No models match your search.