Everyone's talking about AI adoption. Almost nobody has the real numbers. Help us change that — and get the full report 👉 Engineers | Leaders

Context Engineering

Master the art of managing what goes into a language model's context window — from tokens and system prompts to tool calls and memory strategies.


Tokens and Inference

Understand the atoms of LLM communication and why inference is stateless.

  • [ ] What Are Tokens? [text] free
    • Explain what tokens are and why language models use them instead of raw text
    • Use an interactive tokenizer to see how text is split into tokens
    • Describe how different encodings produce different token counts for the same text
  • [ ] Inference Is Stateless [text] free
    • Explain why LLM inference is stateless — the server retains no memory between calls
    • Describe the messages array as the complete state sent with every request
    • Identify the implications of statelessness for context management

The Real Size of Your Context Window

The number on the sticker is not the number you get. Learn to measure what your model can actually use.

    • Distinguish between a model's claimed context length and its effective context length
    • Explain why the vanilla needle-in-a-haystack test gives misleadingly positive results
    • Interpret RULER benchmark data to determine a model's real usable context size
    • Identify the four categories of failure that RULER reveals in long-context models
    • Explain why simple retrieval tests don't predict real-world performance
    • Apply the effective context size concept when choosing models for agent workloads
    • Explain why model performance degrades continuously as context utilization increases, not just at the claimed limit
    • Apply the 40% utilization rule to keep agents in the smart zone
    • Identify when an agent has entered the dumb zone and needs a context reset
    • Calculate the effective context budget for a given model and task
    • Subtract fixed allocations to determine remaining budget for conversation and tool results
    • Design agent architectures that stay within the smart zone

Anatomy of the Messages Array

Walk through each slot in the messages array — from the system prompt to your first user message.

    • Explain the role of the system prompt as index 0 in the messages array
    • Describe how the system prompt shapes model behavior for the entire conversation
    • Design effective system prompts using concise, specific instructions
    • Explain what a harness prompt is and why it occupies index 1 in the messages array
    • Distinguish between vendor-controlled harness prompts and custom harness prompts
    • Evaluate the trade-offs of using off-the-shelf agent tooling vs. building your own
    • Explain how AGENTS.md provides project-specific context at index 2 in the messages array
    • Describe the discovery hierarchy and how multiple AGENTS.md files are concatenated
    • Write effective AGENTS.md instructions that maximize signal per token
    • Explain how MCP tool definitions occupy constant space in the messages array
    • Describe the constant-allocation problem and its impact on available context
    • Contrast eager loading (MCP) with lazy loading (Agent Skills) as memory strategies
    • Identify the initial user prompt as index 4 in the messages array
    • Calculate the remaining context budget after fixed allocations
    • Design prompts that are context-aware — considering what's already in the array

Dynamic Allocation: Tool Calling

See how tool calls dynamically grow the messages array — each call is a memory allocation.

    • Trace how a single tool call adds multiple entries to the messages array
    • Calculate the token cost of a tool call cycle (request + result)
    • Visualize the messages array growing with each tool interaction
    • Walk through a complete agent session showing every array mutation
    • Identify the point where context window pressure begins to affect behavior
    • Explain why agent sessions degrade over time as the array fills

The Ralph Wiggum Loop

The simplest and most powerful application of context engineering — a bash loop that treats each iteration as a fresh memory allocation.

    • Explain how the Ralph Wiggum loop applies context engineering principles to autonomous coding
    • Describe why fresh context windows per iteration avoid the dumb zone
    • Identify the spec and implementation plan as the source of truth that replaces persistent context
    • Design a specification and implementation plan that serves as persistent memory across Ralph iterations
    • Apply bidirectional prompting to surface implicit assumptions before autonomous execution
    • Structure an implementation plan with checkboxes that enable autonomous task selection
    • Apply Ralph loops in implementation mode for spec-driven autonomous coding
    • Use exploration mode to leverage unused tokens for back-burner research and MVPs
    • Deploy brute force testing mode for systematic security and UI validation

Sub-Agents: Managed Runtimes for AI

Use sub-agents as disposable bags of memory — the JVM/CLR pattern applied to context window management.

    • Distinguish the memory-isolation purpose of sub-agents from the common misconception of role-based personalization
    • Explain how a sub-agent provides a disposable context window analogous to a JVM or .NET CLR managed heap
    • Identify scenarios where sub-agent delegation protects the parent's context budget
    • Trace the context window impact of running tests directly vs. delegating to a sub-agent
    • Calculate the token savings of sub-agent delegation for a realistic test suite
    • Apply the 'schedule a future' pattern to keep the parent agent in the smart zone
    • Identify which operations should be delegated to sub-agents vs. kept in the parent context
    • Design the message-passing interface between parent and sub-agent for minimal token transfer
    • Apply the futures/promises mental model to sub-agent scheduling

Message Passing: The Erlang OTP of AI

Context windows are actors. Message passing replaces shared memory. Welcome to the ground floor.

    • Map the Erlang/OTP actor model onto AI agent context windows: processes, mailboxes, isolation, and message passing
    • Explain why shared memory is not directly available between context windows and why copy semantics apply
    • Identify the architectural parallel between spawning Erlang processes and spawning sub-agents
    • Design inbound and outbound message contracts that minimize token transfer between context windows
    • Apply the principle that messages should contain results, never raw data
    • Structure sub-agent system prompts as protocol definitions
    • Apply Erlang's 'let it crash' philosophy to agent context management
    • Design supervision strategies where parent agents restart failed sub-agents rather than recovering corrupted contexts
    • Build multi-level agent hierarchies using the supervision tree pattern

Context Management Strategies

Learn memory management for LLMs — allocation, compaction, and why most tools get it wrong.

    • Explain why LLM context management resembles C's malloc without free
    • Identify the consequences of never reclaiming context window space
    • Compare context management to traditional memory management primitives
    • Explain how compaction works and why it is the default strategy in most agent tooling
    • Identify the risks of non-deterministic context eviction
    • Describe scenarios where compaction removes critical context that keeps agents on track
  • [ ] Better Strategies [text]
    • Design a context management strategy that protects critical allocations
    • Apply priority-based eviction to preserve specifications and instructions
    • Evaluate emerging approaches to context management in agent systems