AI

Context window (LLM)

The maximum amount of text (measured in tokens) an LLM can consider in a single conversation. The hard ceiling on how much knowledge base, history, and current message the model can hold at once.

What it means

Every LLM has a context window: a finite limit on how many tokens (roughly, sub-words) it can process in a single call. Modern models range from 4,000 tokens (older GPT models) to over 1,000,000 tokens (newer Claude and Gemini models). Context window covers the system prompt, the retrieved knowledge base, the conversation history, and the user's current message, all of which add up.

When the conversation grows past the window, the oldest content gets cut off (or summarised, depending on implementation). The model literally cannot remember what was said earlier.

Why it matters

Context window is the practical ceiling on agent capability. A 4k window forces you to choose between long system prompts and long conversation memory. A 200k window means you can keep a customer's entire history in context, plus a deep knowledge base, in every reply.

Larger windows are not free: they cost more per call, run slower, and (counterintuitively) sometimes produce worse responses if too much irrelevant context is included. The art is using just enough.

Example

A property agency runs an AI agent on a 200k-token window. Every reply includes the full property listing for context, the customer's chat history, and the agent's system prompt. The agent can reference details from a 30-message-old part of the chat naturally. Same agent on a 4k window would forget those details after about 10 turns.

Where this comes up

← Back to all terms