Context Engineering

Master the art of managing what goes into a language model's context window — from tokens and system prompts to tool calls and memory strategies.

Tokens and Inference

Understand the atoms of LLM communication and why inference is stateless.

[ ] What Are Tokens? [text] free
- Explain what tokens are and why language models use them instead of raw text
- Use an interactive tokenizer to see how text is split into tokens
- Describe how different encodings produce different token counts for the same text
[ ] Inference Is Stateless [text] free
- Explain why LLM inference is stateless — the server retains no memory between calls
- Describe the messages array as the complete state sent with every request
- Identify the implications of statelessness for context management

The Real Size of Your Context Window

The number on the sticker is not the number you get. Learn to measure what your model can actually use.

[ ] Marketing Numbers vs. Engineering Reality [text]
- Distinguish between a model's claimed context length and its effective context length
- Explain why the vanilla needle-in-a-haystack test gives misleadingly positive results
- Interpret RULER benchmark data to determine a model's real usable context size
[ ] Why Models Fail at Length [text]
- Identify the four categories of failure that RULER reveals in long-context models
- Explain why simple retrieval tests don't predict real-world performance
- Apply the effective context size concept when choosing models for agent workloads
[ ] The Smart Zone and the Dumb Zone [text]
- Explain why model performance degrades continuously as context utilization increases, not just at the claimed limit
- Apply the 40% utilization rule to keep agents in the smart zone
- Identify when an agent has entered the dumb zone and needs a context reset
[ ] Measuring Your Context Budget [text]
- Calculate the effective context budget for a given model and task
- Subtract fixed allocations to determine remaining budget for conversation and tool results
- Design agent architectures that stay within the smart zone

Anatomy of the Messages Array

Walk through each slot in the messages array — from the system prompt to your first user message.

[ ] Slot 0 — The System Prompt [text]
- Explain the role of the system prompt as index 0 in the messages array
- Describe how the system prompt shapes model behavior for the entire conversation
- Design effective system prompts using concise, specific instructions
[ ] Slot 1 — The Harness Prompt [text]
- Explain what a harness prompt is and why it occupies index 1 in the messages array
- Distinguish between vendor-controlled harness prompts and custom harness prompts
- Evaluate the trade-offs of using off-the-shelf agent tooling vs. building your own
[ ] Slot 2 — AGENTS.md [text]
- Explain how AGENTS.md provides project-specific context at index 2 in the messages array
- Describe the discovery hierarchy and how multiple AGENTS.md files are concatenated
- Write effective AGENTS.md instructions that maximize signal per token
[ ] Slot 3 — MCP Servers and Agent Skills [text]
- Explain how MCP tool definitions occupy constant space in the messages array
- Describe the constant-allocation problem and its impact on available context
- Contrast eager loading (MCP) with lazy loading (Agent Skills) as memory strategies
[ ] Slot 4 — Your Prompt [text]
- Identify the initial user prompt as index 4 in the messages array
- Calculate the remaining context budget after fixed allocations
- Design prompts that are context-aware — considering what's already in the array

Dynamic Allocation: Tool Calling

See how tool calls dynamically grow the messages array — each call is a memory allocation.

[ ] Tool Calls as Memory Allocations [text]
- Trace how a single tool call adds multiple entries to the messages array
- Calculate the token cost of a tool call cycle (request + result)
- Visualize the messages array growing with each tool interaction
[ ] A Real Agent Session [text]
- Walk through a complete agent session showing every array mutation
- Identify the point where context window pressure begins to affect behavior
- Explain why agent sessions degrade over time as the array fills

The Ralph Wiggum Loop

The simplest and most powerful application of context engineering — a bash loop that treats each iteration as a fresh memory allocation.

[ ] What Is the Ralph Wiggum Loop? [text]
- Explain how the Ralph Wiggum loop applies context engineering principles to autonomous coding
- Describe why fresh context windows per iteration avoid the dumb zone
- Identify the spec and implementation plan as the source of truth that replaces persistent context
[ ] Specs and Plans — The Persistent Memory [text]
- Design a specification and implementation plan that serves as persistent memory across Ralph iterations
- Apply bidirectional prompting to surface implicit assumptions before autonomous execution
- Structure an implementation plan with checkboxes that enable autonomous task selection
[ ] Three Modes of Ralph [text]
- Apply Ralph loops in implementation mode for spec-driven autonomous coding
- Use exploration mode to leverage unused tokens for back-burner research and MVPs
- Deploy brute force testing mode for systematic security and UI validation

Sub-Agents: Managed Runtimes for AI

Use sub-agents as disposable bags of memory — the JVM/CLR pattern applied to context window management.

[ ] Sub-Agents Are Not About Personas [text]
- Distinguish the memory-isolation purpose of sub-agents from the common misconception of role-based personalization
- Explain how a sub-agent provides a disposable context window analogous to a JVM or .NET CLR managed heap
- Identify scenarios where sub-agent delegation protects the parent's context budget
[ ] The Test Runner Problem [text]
- Trace the context window impact of running tests directly vs. delegating to a sub-agent
- Calculate the token savings of sub-agent delegation for a realistic test suite
- Apply the 'schedule a future' pattern to keep the parent agent in the smart zone
[ ] Designing Sub-Agent Boundaries [text]
- Identify which operations should be delegated to sub-agents vs. kept in the parent context
- Design the message-passing interface between parent and sub-agent for minimal token transfer
- Apply the futures/promises mental model to sub-agent scheduling

Message Passing: The Erlang OTP of AI

Context windows are actors. Message passing replaces shared memory. Welcome to the ground floor.

[ ] Context Windows Are Actors [text]
- Map the Erlang/OTP actor model onto AI agent context windows: processes, mailboxes, isolation, and message passing
- Explain why shared memory is not directly available between context windows and why copy semantics apply
- Identify the architectural parallel between spawning Erlang processes and spawning sub-agents
[ ] Designing the Message Protocol [text]
- Design inbound and outbound message contracts that minimize token transfer between context windows
- Apply the principle that messages should contain results, never raw data
- Structure sub-agent system prompts as protocol definitions
[ ] Supervision and Failure [text]
- Apply Erlang's 'let it crash' philosophy to agent context management
- Design supervision strategies where parent agents restart failed sub-agents rather than recovering corrupted contexts
- Build multi-level agent hierarchies using the supervision tree pattern

Context Management Strategies

Learn memory management for LLMs — allocation, compaction, and why most tools get it wrong.

[ ] The malloc Without free [text]
- Explain why LLM context management resembles C's malloc without free
- Identify the consequences of never reclaiming context window space
- Compare context management to traditional memory management primitives
[ ] Why Compaction Is Dangerous [text]
- Explain how compaction works and why it is the default strategy in most agent tooling
- Identify the risks of non-deterministic context eviction
- Describe scenarios where compaction removes critical context that keeps agents on track
[ ] Better Strategies [text]
- Design a context management strategy that protects critical allocations
- Apply priority-based eviction to preserve specifications and instructions
- Evaluate emerging approaches to context management in agent systems