Even 1M-token models have a much shorter truly reliable zone.

The 1M Token Lie

Advertised context windows and actually reliable zones are different. See if your document truly fits.

Document Length:
Document Language:

Est. tokens: 17K

Presets:

Per-Model Effectiveness Analysis

8 SAFE · 0 at-risk/over (of 8 models)

GPT-4o

OpenAI

✓ Safe
Advertised Context: 128K13%
Reliable Zone: 64K27%
Reliability Ratio: 50%+47K remaining

GPT-4.1

OpenAI

✓ Safe
Advertised Context: 1.0M2%
Reliable Zone: 200K9%
Reliability Ratio: 19%+183K remaining

Claude Opus 4

Anthropic

✓ Safe
Advertised Context: 200K9%
Reliable Zone: 140K12%
Reliability Ratio: 70%+123K remaining

Claude Sonnet 4

Anthropic

✓ Safe
Advertised Context: 200K9%
Reliable Zone: 120K14%
Reliability Ratio: 60%+103K remaining

Gemini 2.5 Pro (1M)

Google

✓ Safe
Advertised Context: 1.0M2%
Reliable Zone: 300K6%
Reliability Ratio: 29%+283K remaining

Gemini 2.5 Pro (2M)

Google

✓ Safe
Advertised Context: 2.1M1%
Reliable Zone: 500K3%
Reliability Ratio: 24%+483K remaining

Llama 3.3 70B

Meta

✓ Safe
Advertised Context: 128K13%
Reliable Zone: 32K53%
Reliability Ratio: 25%+15K remaining

Mistral Large 2

Mistral AI

✓ Safe
Advertised Context: 128K13%
Reliable Zone: 50K34%
Reliability Ratio: 39%+33K remaining

Methodology note

Reliable-zone estimates are based on NIAH (needle-in-haystack) benchmarks, RULER (2024), and published research including "Lost in the Middle" (Liu et al., 2023). Values represent approximate thresholds where recall rates drop noticeably. Data as of 2026-06. Actual performance varies by task, prompt structure, and model version.