Even 1M-token models have a much shorter truly reliable zone.
The 1M Token Lie
Advertised context windows and actually reliable zones are different. See if your document truly fits.
Document Length:
Document Language:
Est. tokens: 17K
Presets:
Per-Model Effectiveness Analysis
8 SAFE · 0 at-risk/over (of 8 models)GPT-4o
OpenAI
Advertised Context: 128K13%
Reliable Zone: 64K27%
Reliability Ratio: 50%+47K remaining
GPT-4.1
OpenAI
Advertised Context: 1.0M2%
Reliable Zone: 200K9%
Reliability Ratio: 19%+183K remaining
Claude Opus 4
Anthropic
Advertised Context: 200K9%
Reliable Zone: 140K12%
Reliability Ratio: 70%+123K remaining
Claude Sonnet 4
Anthropic
Advertised Context: 200K9%
Reliable Zone: 120K14%
Reliability Ratio: 60%+103K remaining
Gemini 2.5 Pro (1M)
Advertised Context: 1.0M2%
Reliable Zone: 300K6%
Reliability Ratio: 29%+283K remaining
Gemini 2.5 Pro (2M)
Advertised Context: 2.1M1%
Reliable Zone: 500K3%
Reliability Ratio: 24%+483K remaining
Llama 3.3 70B
Meta
Advertised Context: 128K13%
Reliable Zone: 32K53%
Reliability Ratio: 25%+15K remaining
Mistral Large 2
Mistral AI
Advertised Context: 128K13%
Reliable Zone: 50K34%
Reliability Ratio: 39%+33K remaining
Methodology note
Reliable-zone estimates are based on NIAH (needle-in-haystack) benchmarks, RULER (2024), and published research including "Lost in the Middle" (Liu et al., 2023). Values represent approximate thresholds where recall rates drop noticeably. Data as of 2026-06. Actual performance varies by task, prompt structure, and model version.