What it means
Every LLM has a context window: a finite limit on how many tokens (roughly, sub-words) it can process in a single call. Modern models range from 4,000 tokens (older GPT models) to over 1,000,000 tokens (newer Claude and Gemini models). Context window covers the system prompt, the retrieved knowledge base, the conversation history, and the user's current message, all of which add up.
When the conversation grows past the window, the oldest content gets cut off (or summarised, depending on implementation). The model literally cannot remember what was said earlier.
Why it matters
Context window is the practical ceiling on agent capability. A 4k window forces you to choose between long system prompts and long conversation memory. A 200k window means you can keep a customer's entire history in context, plus a deep knowledge base, in every reply.
Larger windows are not free: they cost more per call, run slower, and (counterintuitively) sometimes produce worse responses if too much irrelevant context is included. The art is using just enough.
Example
A property agency runs an AI agent on a 200k-token window. Every reply includes the full property listing for context, the customer's chat history, and the agent's system prompt. The agent can reference details from a 30-message-old part of the chat naturally. Same agent on a 4k window would forget those details after about 10 turns.