Context Engineering: The Complete Guide for AI-Assisted Coding
If you’ve been using AI coding tools and wondering why results are inconsistent — brilliant one session, garbage the next — the answer isn’t the model. It’s the context.
Context engineering is the discipline of curating the entire information environment an AI agent operates within. Not just what you type in the prompt box. Everything: the files it reads, the rules it follows, the history it carries, the tools it can reach, and the structure of the project it navigates.
The term was popularized by Shopify CEO Tobi Lutke in mid-2025:
Andrej Karpathy endorsed it immediately, adding crucial nuance:
He enumerated what “doing this right” involves: task descriptions, few-shot examples, RAG, related data, tools, state and history, and compacting. Then the critical caveat: “Too much or too irrelevant context can increase costs and degrade performance.”
That last point is the one most people miss. Context engineering isn’t about giving the AI more information. It’s about giving it the right information.
By February 2026, Karpathy took the terminology further — coining “agentic engineering” as the next evolution beyond vibe coding, describing a workflow where “you are not writing the code directly 99% of the time… you are orchestrating agents who do and acting as oversight.”
The progression is clear:
Stage | What You Do | Core Skill |
|---|---|---|
| Prompt engineering | Write clever instructions | Wordsmithing |
| Context engineering | Curate the information environment | Information architecture |
| Agentic engineering | Orchestrate autonomous agents | Systems design |
Each stage builds on the last. You can’t do agentic engineering without context engineering. And context engineering is where most developers are right now — or should be.
Why Context Engineering Matters More Than Model Choice
Here’s a counterintuitive truth backed by research: a developer with a clean, well-structured context on a weaker model will outperform one with a cluttered context on a stronger model.
Chroma Research tested 18 LLMs and found that across all models, accuracy drops as input length increases — even on simple tasks. The “Lost in the Middle” phenomenon (first identified by Stanford researchers) shows LLMs attend strongly to tokens at the start and end of the context window but poorly to the middle.
When a debugging session has loaded 20,000 tokens of irrelevant file contents and dead-end explorations, the actual relevant code — sitting somewhere in the middle — gets less attention.
Anthropic’s own best practices say it directly:
“Most best practices are based on one constraint: Claude’s context window fills up fast, and performance degrades as it fills.”
Google’s context engineering whitepaper arrived at the same conclusion:
The true intelligence of an agent doesn’t come from the model — it comes from how you manage context.
Martin Fowler’s team at ThoughtWorks studied this in practice and found an almost comically simple truth: all forms of AI coding context engineering ultimately involve “a bunch of markdown files with prompts.” Two main categories — Instructions (tell the agent what to do) and Skills (resources the LLM loads on demand).
Simple in concept. Hard in practice.
The Four Pillars Framework
Sequoia Capital’s Inference newsletter published “Vibe Coding Needs Context Engineering” in July 2025, arguing that “intuition does not scale, structure does.” They identified four pillars — a framework also developed independently by LangChain:
Pillar 1: Write Context
Save persistent information outside the context window. This is your CLAUDE.md, your .cursor/rules/, your spec documents. Anything the agent needs to know every session gets written down once, not repeated every prompt.
Think of it as the difference between telling a new team member your coding standards verbally every morning versus writing them in a wiki. The written version works whether you’re there or not, whether it’s the first day or the hundredth.
What to write down:
- Build and test commands the agent can’t guess from reading code
- Code style rules that differ from language defaults
- Architecture decisions specific to your project
- Common gotchas and non-obvious behaviors
- Which tools and frameworks you’re using and why
What NOT to write:
- Anything the agent can figure out by reading your code
- Standard language conventions (TypeScript naming, Python PEP 8)
- Detailed API docs (link instead)
- File-by-file codebase descriptions
Pillar 2: Select Context
Pull in only what’s relevant for the current task. This is the hardest pillar because it requires judgment.
Don’t dump your entire codebase into the prompt. Don’t paste 14 files “for reference.” Targeted file reads, specific function references, relevant test outputs — not “here’s everything, figure it out.”
Cursor’s research on Dynamic Context Discovery quantified this problem. In A/B testing, they found that rather than including all tools and context upfront, retrieving only tool names and fetching full details as needed reduced total agent tokens by 46.9% — while maintaining or improving quality.
The “token tax” is real. As one analysis found: in large projects with 20 global rules, developers might be sending 2,000 extra tokens with every message. Rules taking up 25% of the context window means the AI has 25% less space for actual source code.
Pillar 3: Compress Context
Manage token usage through summarization and pruning. When a session gets long, compact it. When an exploration is done, clear the dead ends.
Every token of noise competes with signal. A 200-token rule you added “just in case” is 200 tokens of source code your agent can’t see.
Practical compression techniques:
- Compact conversations: Claude Code’s
/compactcommand summarizes the conversation, reducing tokens 50-70%. You can focus it:/compact Focus on the API changes we discussed - Clear dead sessions:
/cleardeletes the entire conversation. Use it when switching tasks - Summarize research: When using subagents for exploration, they run in separate context windows and return summaries — keeping the parent context clean
- Prune rules files: If your CLAUDE.md or cursor rules exceed 300-500 lines, you’re probably hurting more than helping
Pillar 4: Isolate Context
Structure information so it doesn’t bleed across tasks. Use subagents for research (they run in separate context windows and report back summaries). Start fresh sessions for unrelated work.
Don’t let Monday’s debugging contaminate Tuesday’s feature build.
As Philipp Schmid put it:
“Prompt Engineering = Crafting perfect instruction strings. Context Engineering = Building systems that dynamically assemble comprehensive contextual information tailored to specific tasks.”
The Memory Layer: Rules Files That Actually Work
Every major AI coding tool now has a mechanism for persistent context — a file (or set of files) that gets loaded automatically at the start of every session. This is the most important file in your project. More important than your README. More important than your config.
Because it’s the file that determines whether your AI agent understands your project or hallucinates about it.
CLAUDE.md (Claude Code)
Claude Code reads CLAUDE.md at the start of every session. It’s the project’s constitution — the rules that govern all AI behavior within your codebase.
Anthropic’s official docs are clear about what belongs here:
Include | Exclude |
|---|---|
| Build/test commands Claude can’t guess | Anything Claude can figure out from reading code |
| Code style rules that differ from defaults | Standard language conventions |
| Architectural decisions specific to your project | Detailed API docs (link instead) |
| Common gotchas and non-obvious behaviors | Information that changes frequently |
| Developer environment quirks (env vars, etc.) | File-by-file codebase descriptions |
The official docs warn against the most common failure mode:
“The over-specified CLAUDE.md. If your CLAUDE.md is too long, Claude ignores half of it because important rules get lost in the noise. Fix: Ruthlessly prune.”
The hierarchy system:
~/.claude/CLAUDE.md— global preferences (your personal coding style)./CLAUDE.md— project root (checked into git, shared with team)./CLAUDE.local.md— personal overrides (gitignored)./src/feature/CLAUDE.md— directory-scoped rules (only loaded when working in that directory)
Community consensus from HumanLayer, Builder.io, and Arize AI: keep it under 300 lines. Run /init to auto-generate a starter from your codebase structure. Iterate based on actual agent behavior, not hypothetical scenarios.
Thomas Landgraf’s deep dive covers advanced patterns: using CLAUDE.md to encode project-specific testing strategies, deployment pipelines, and even team communication preferences.
Cole Medin’s context-engineering-intro repo provides a hands-on starting point: “Context engineering is the new vibe coding — it’s the way to actually make AI coding assistants work.”
.cursor/rules/ (Cursor)
Cursor’s rules system is more granular than CLAUDE.md, with four types of rules:
- Always Apply — active every session (like CLAUDE.md)
- Apply Intelligently — agent decides relevance based on your description
- Apply to Specific Files — triggered by glob patterns (e.g., only for
*.tsxfiles) - Apply Manually — invoked via
@rule-name
Rules live in .cursor/rules/*.mdc files. The awesome-cursorrules repo has community templates. Same official advice: keep content under 500 lines, decompose large rules into composable pieces.
An empirical study of Cursor Rules analyzing thousands of repositories found that rules often grow organically and accumulate technical debt — just like code. The most effective teams treat their rules files as code: reviewing them in PRs, deleting stale instructions, and testing against actual agent behavior.
.github/copilot-instructions.md (Copilot)
GitHub Copilot’s equivalent: a .github/copilot-instructions.md file for repository-wide instructions, plus .github/instructions/NAME.instructions.md files for path-specific rules.
AGENTS.md (Cross-Tool Standard)
AGENTS.md is emerging as a cross-tool standard — recognized by Claude Code, Copilot, Cursor, and Gemini. Plain markdown, no metadata needed. If you work across multiple tools, this is the file that follows you everywhere.
What the Research Actually Shows
There’s now academic evidence on whether these context files actually help. The results are nuanced — and important.
An empirical study of 2,303 agent context files from 1,925 repos found that these files function like configuration code: they evolve frequently via small additions and prioritize build commands (62.3%), implementation details (69.9%), and architecture (67.7%).
But here’s the counterintuitive finding: a study evaluating AGENTS.md files found that context files can reduce task success rates versus no context, while increasing inference cost by 20%+.
The lesson isn’t that context files don’t work — it’s that poorly maintained context files are worse than none. Outdated instructions, contradictory rules, stale architecture descriptions — these actively mislead the agent.
Simon Willison highlighted a study of 9,649 experiments across 11 models comparing YAML, Markdown, JSON, and TOML formats for context delivery. The format matters less than the content quality — but structured formats consistently outperformed unstructured prose.
Spotify’s engineering team documented this in their “Honk” background coding agent (1,500+ merged PRs). Their second blog post is entirely about context engineering — the architecture of hot-memory constitutions, specialized domain agents, and cold-memory specification documents that made the agent actually work at production scale.
Advanced Technique: Dynamic Context Discovery
Static context — rules that load every session regardless of task — is the simplest approach. But it doesn’t scale.
Cursor’s Dynamic Context Discovery represents the next evolution. Instead of loading everything upfront:
- The agent starts with a lightweight index of available tools and context
- It identifies what’s relevant to the current task
- It fetches full details only for what it needs
The results in their A/B test: 46.9% reduction in total tokens used, with no quality degradation.
Claude Code’s skills system works similarly. Skills are context that loads on demand — when the agent determines it’s relevant to the current task. Instead of cramming everything into CLAUDE.md, you decompose context into modular, task-specific units.
Towards Data Science covered this pattern as “escaping the prompt engineering hamster wheel” — moving from ever-longer instructions to composable, reusable context modules.
Session Management: When to Clear and When to Keep Going
The most underrated context engineering skill is knowing when to throw away your context and start fresh.
Anthropic’s docs name specific trigger conditions:
“If you’ve corrected Claude more than twice on the same issue in one session, the context is cluttered with failed approaches. Run
/clearand start fresh with a more specific prompt.”
The four signals it’s time for /clear:
- Switching to unrelated tasks — don’t let feature work context bleed into bug fixing
- After two failed corrections — the failed attempts are polluting the context
- After “kitchen sink” sessions — you’ve mixed too many topics
- When performance visibly decreases — responses get generic, instructions get forgotten
/clear vs /compact:
/clear— nuclear option. Deletes entire conversation. CLAUDE.md re-loads fresh/compact [instructions]— surgical option. Summarizes the conversation (50-70% reduction). You can focus:/compact Focus on the API changes we discussed
Armin Ronacher adds an important exception: don’t clear when the failure history itself is valuable. If the agent has tried and failed a specific approach, that context prevents it from repeating the same mistake. The art is knowing whether failed attempts are useful signal or useless noise.
For long-running work, start a fresh session after approximately 30 messages, and always write key decisions to your context files before clearing so they persist.
Structuring Your Project for AI
Context engineering isn’t just about memory files and session management. It’s about how your entire project is organized. Agents navigate codebases by reading files and following imports — the easier your project is to navigate, the better the agent performs.
Favor Vertical Over Horizontal Organization
Feature-driven layouts work better than layer-driven layouts:
An agent working on auth only needs to read the auth directory. A layer-driven layout forces it to load files from every directory to understand a single feature.
Use Semantic File Names
user-authentication-service.ts is better than uas.ts. Agents infer file contents from names before reading them — descriptive names reduce unnecessary file reads and save context.
Keep Files Small
Anthropic’s best practices recommend smaller, focused modules. A 3,000-line monolith forces the agent to read (and hold in context) the entire file to modify a single function.
Colocate Tests with Code
If your test for auth-service.ts is in auth-service.test.ts right next to it, the agent finds it instantly. If it’s in tests/unit/services/auth/test_auth_service.py, that’s multiple directory traversals burning context.
Treat Context Files as Code
They evolve with your codebase. Review them in PRs. Delete stale instructions. Add new patterns when you discover them. As EclipseSource notes, the hard problem isn’t creating context files — it’s keeping them accurate.
The “Harness Engineering” Pattern
Dex Horthy from Hex coined an emerging concept that captures where context engineering is heading:
“Harness engineering” is applying context engineering principles to how you use an existing agent — not just how you configure it. It’s the difference between writing good CLAUDE.md and designing the entire workflow: when to spawn subagents, how to structure multi-step tasks, when to isolate vs. share context.
His YC Root Access talk is the best technical deep dive on advanced context engineering — covering why conversational prompting fails at scale, spec-first development, and the finding that agents tend to perform better when using less than 40% of the LLM’s context window.
Context Engineering in Practice: A Real Workflow
Let’s make this concrete. Here’s how context engineering looks in a real development session with Claude Code.
Step 1: Start With Clean Context
Step 2: Be Specific About the Task
Instead of: “Fix the login bug”
Try: “The login form on /auth/login returns a 401 when valid credentials are submitted. The issue started after commit abc123. Check src/auth/auth-service.ts and the Supabase auth configuration.”
You’ve selected context: specific file, specific commit, specific behavior. The agent doesn’t need to explore your entire codebase.
Step 3: Use Subagents for Research
When you need to understand a large codebase area, don’t ask the main agent to read 20 files. Use subagents:
Step 4: Compact at Natural Breakpoints
After completing a subtask (fixing the auth bug), before starting the next task (adding a new feature):
Step 5: Write Decisions Back to Memory
Before clearing, capture anything the agent learned that should persist:
This is the cycle: clean start → targeted context → isolate research → compress at breakpoints → persist insights → clean start again.
The Bigger Picture: Four Disciplines of AI Development
Context engineering sits within a broader framework. By 2026, what we used to call “prompting” has split into four distinct disciplines:
-
Prompt Craft — writing clear instructions. The original skill. By 2026, this is table stakes.
-
Context Engineering — curating the entire information environment an agent operates within. What this guide covers.
-
Intent Engineering — encoding goals, values, and decision boundaries into agent infrastructure. Telling agents what to want, not just what to do.
-
Specification Engineering — writing structured documents that agents can execute against over long periods without intervention. The foundation for truly autonomous development.
This progression maps directly onto skill levels. Prompt craft is autocomplete-level. Context engineering is agent-assisted. Intent and specification engineering are orchestrator-level — where the real productivity multipliers live.
The “Vibe Coding Hangover”
There’s a growing recognition that the initial excitement around vibe coding — just describe what you want and let AI build it — hit a wall. Multiple writers have described a “vibe coding hangover” — the realization that unstructured AI coding produces unmaintainable code.
Context engineering is the antidote. Not a rejection of AI coding, but its maturation. As GitHub’s engineering blog puts it: you don’t get better AI outputs by writing cleverer prompts. You get them by engineering better context.
Quick-Start Checklist
If you’re just getting started with context engineering, here’s the minimum viable setup:
1. Create your rules file (5 minutes)
- Claude Code: Run
/initto generate a starter CLAUDE.md - Cursor: Create
.cursor/rules/project.mdc - Copilot: Create
.github/copilot-instructions.md
2. Add the essentials (10 minutes)
- Build and test commands
- 3-5 most important code style rules
- Architecture overview (one paragraph)
- Top 3 gotchas new developers hit
3. Set session habits (ongoing)
- Start fresh sessions for unrelated tasks
- Compact after 20-30 messages
- Clear after 2 failed corrections
- Write decisions to your rules file before clearing
4. Organize for navigability (when refactoring)
- Feature-based directories over layer-based
- Semantic file names
- Tests colocated with source
- Small, focused files
5. Iterate your rules (weekly)
- Delete rules the agent already follows naturally
- Add rules for mistakes the agent keeps making
- Keep total length under 300 lines (CLAUDE.md) or 500 lines (Cursor rules)
Resources
Essential Reading
- Anthropic: Best Practices for Claude Code
- Martin Fowler: Context Engineering for Coding Agents
- Sequoia: Vibe Coding Needs Context Engineering
- Spotify: Context Engineering for Background Coding Agents
- LangChain: Context Engineering for Agents
Rules File Guides
- Builder.io: How to Write a Good CLAUDE.md
- HumanLayer: Writing a Good CLAUDE.md
- Gend.co: Claude Skills and CLAUDE.md Guide
- DEV: CLAUDE.md Best Practices — From Basic to Adaptive
- Cursor: Rules Documentation
Research Papers
- Codified Context: Infrastructure for AI Agents
- Beyond the Prompt: An Empirical Study of Cursor Rules
- Empirical Study of 2,303 Agent Context Files
- Chroma Research: Context Rot
Repos and Templates
- Cole Medin: Context Engineering Intro
- NeoLab: Context Engineering Kit
- awesome-cursorrules
- AGENTS.md Standard
Talks
- Dex Horthy: Advanced Context Engineering for Agents (YC Root Access)
- DDD Melbourne 2026: Throw Away The Vibes
Context engineering is the skill that separates developers who get consistent, high-quality results from AI coding tools from those who get lucky sometimes. Master it, and every other AI development skill becomes easier.
This guide is Part 5 of our comprehensive Learn Vibe Coding series. Start from the beginning for the full picture.
Learn Vibe Coding →
Not sure which AI coding tool to use? See our comparison of the best AI coding assistants in 2026.
Compare AI Coding Tools →