What is Chain-of-Thought Prompting?
Chain-of-thought (CoT) prompting is a technique in which you instruct a large language model to show its reasoning steps before producing a final answer. Instead of jumping directly to a conclusion, the model walks through intermediate reasoning — a "chain of thoughts" — that leads to a more accurate result.
Chain-of-thought prompting was introduced in a 2022 Google Brain paper ("Chain-of-Thought Prompting Elicits Reasoning in Large Language Models") and has since become one of the most widely used prompt engineering techniques in production AI systems. It consistently improves accuracy on tasks that require multi-step reasoning, arithmetic, logic, and planning.
How Chain-of-Thought Prompting Works
The core mechanism is simple. Instead of asking:
"Janet's school starts at 8:30 AM. She has a 20-minute commute. What time should she leave?"
You prompt the model to reason step by step:
"Think through this problem step by step before giving your final answer."
The model then produces intermediate reasoning:
- School starts at 8:30 AM.
- She needs 20 minutes to commute.
- Therefore she should leave at 8:10 AM.
- Final answer: 8:10 AM
This intermediate reasoning — even though it is generated by the model itself — acts as a scaffold that guides the model toward more reliable conclusions. The model is, in effect, breaking a hard problem into easier sub-problems.
Types of Chain-of-Thought Prompting
Zero-Shot CoT
You simply append "Let's think step by step" (or equivalent) to your prompt. No examples are needed. Zero-shot CoT is easy to implement and surprisingly effective across many model families.
Prompt: A store has 48 apples. They sell 1/3 in the morning and
1/4 of the remaining in the afternoon. How many are left?
Think through this step by step.
Zero-shot CoT works well with capable models (GPT-4, Claude 3+, Gemini) but may produce weaker chains on smaller models.
Few-Shot CoT
You provide several worked examples in the prompt, each showing both the reasoning steps and the final answer. The model learns the expected reasoning style and depth from these examples before tackling the actual question.
Few-shot CoT requires more prompt engineering effort but produces more consistent and structured reasoning, particularly for domain-specific tasks (legal analysis, financial modeling, medical triage).
Self-Consistency
A complementary technique: run the same chain-of-thought prompt multiple times (with temperature > 0) and take the majority vote across answers. Self-consistency significantly improves accuracy on math and reasoning benchmarks by aggregating across multiple reasoning paths rather than trusting a single chain.
Tree of Thoughts (ToT)
An extension of CoT where the model generates multiple possible reasoning branches at each step and evaluates which branch is most promising before continuing. Tree of Thoughts is more powerful but more expensive — it is used when a task requires genuine planning or backtracking, such as solving logic puzzles or generating multi-step code.
When to Use Chain-of-Thought Prompting
CoT is most valuable when your task has multiple reasoning steps, structured logic, or requires intermediate state. High-value use cases include:
- Mathematical calculations — unit conversions, financial projections, pricing logic
- Code generation and debugging — prompting the model to reason about the bug before writing the fix
- Classification with justification — having the model explain its classification before finalizing it (reduces sycophancy)
- Legal and compliance analysis — walking through relevant provisions before reaching a conclusion
- Planning and task decomposition — breaking down a goal into an ordered list of steps
CoT is less useful for simple factual lookups, short-form content generation, or tasks where the answer is immediate and not dependent on intermediate steps.
Chain-of-Thought and AI Agents
Chain-of-thought reasoning is fundamental to how modern AI agents work. Agent frameworks like ReAct (Reasoning + Acting) explicitly interleave CoT reasoning steps with tool calls:
- Thought: I need to find the current price of the stock.
- Action: Call the
get_stock_pricetool. - Observation: Price is $142.50.
- Thought: Now I can calculate the portfolio value.
- Action: Multiply by the number of shares.
- Final Answer: The portfolio is worth $14,250.
Each thought step is a chain-of-thought prompt that grounds the agent's next action. Without CoT, agents tend to hallucinate tool calls or skip necessary steps.
Implementation Tips
Be explicit about format. If you want numbered steps, say so. Vague instructions like "think step by step" produce variable output quality. More specific instructions like "List your reasoning in numbered steps, then provide your final answer on a new line prefixed with 'Answer:'" produce more consistent, parseable results.
Match CoT depth to task complexity. For simple two-step reasoning, a brief "let's think step by step" is enough. For complex multi-domain analysis, provide a structured template the model should fill in.
Use CoT with output parsing. When building production pipelines, parse the final answer separately from the reasoning chain. The reasoning is valuable for debugging and explainability, but downstream systems should consume only the structured final output.
CoT adds tokens — budget for it. A chain-of-thought response may be 3–5× longer than a direct answer. Factor this into your latency and cost calculations, especially for high-volume use cases.
Chain-of-Thought and Model Quality
Smaller models (sub-7B parameters) benefit less from CoT and sometimes produce incoherent reasoning chains. CoT is most effective with frontier models or well-tuned instruction-following models above 13B parameters. If you are evaluating models for a reasoning-heavy task, always test with and without CoT to understand the performance gap.
For a broader view of how these prompting techniques fit into an overall AI project, scoping the evaluation harness before you commit to a prompting strategy will save you significant rework.
Summary
Chain-of-thought prompting is one of the highest-leverage techniques in applied prompt engineering. It costs tokens but consistently improves accuracy, transparency, and debuggability for complex tasks. In a world of increasingly capable but still imperfect models, making reasoning visible is often the difference between a demo and a production-grade AI feature.
Want help designing prompting strategies and evaluation pipelines for your AI product? Talk to 100x Engineering — we've built production CoT pipelines for founders across healthcare, fintech, and developer tools.
Further Reading
- AI Agent Architecture Patterns — How to structure multi-agent AI systems for production
- What Are CLAWs? Karpathy's AI Agents Framework Explained — A deep dive into autonomous AI agent design
- Startup AI Tech Stack 2026 — The tools and frameworks powering modern AI products
- Build an AI Product Without an ML Team — How to ship AI features with a lean engineering team
Compare: Claude vs GPT-4 for Coding · Anthropic vs OpenAI for Enterprise · LangChain vs LlamaIndex
Browse all terms: AI Glossary · Our services: View Solutions