What it means
Temperature is a numeric setting (typically 0.0 to 2.0) that controls the randomness of an LLM's output. At temperature 0, the model picks the single most-likely next word every time, producing nearly identical responses to identical prompts. At higher temperatures, the model samples from a broader distribution, producing varied output.
Most production AI agents run between 0.0 and 0.3 for predictable, factual responses. Creative writing tasks (slogans, brainstorming) often run at 0.7 to 1.0 for variety.
Why it matters
Temperature is the difference between an agent that gives the same accurate answer to a pricing question every time, and one that occasionally invents a different price. For customer-facing agents on factual topics, low temperature is non-negotiable.
It is one of the simplest knobs to misconfigure. Defaults vary across LLM APIs; an agent running on the wrong default can feel 'off' without anyone noticing the cause.
Example
A clinic's AI agent at temperature 0.7 occasionally answers the same pricing question with slightly different prices and phrasings. Customers notice the inconsistency. The team drops temperature to 0.1; same prompt now produces near-identical, on-script responses every time. Inconsistency complaints stop within a week.