glossary

Key terms and definitions

Activation Function: A nonlinear function applied after a neuron's weighted sum — without it, stacking layers would collapse into a single linear transformation.
Adam Optimizer: An adaptive learning rate optimizer that maintains per-parameter momentum and squared gradient estimates with bias correction.
Agent Backpressure: Automated feedback mechanisms — type systems, test suites, linters, and pre-commit hooks — that allow AI agents to detect and correct their own mistakes without human intervention.
Agent Harness: The orchestration layer around a language model that manages prompts, tool execution, policy checks, and loop control for autonomous agent behavior.
Agent Heartbeat: A liveness detection mechanism where an agent periodically updates a timestamp in a shared database, allowing a monitoring system to detect crashed agents and automatically reassign their uncompleted work.
Agent Skills: Lazily-evaluated instruction sets that an agent loads into its context window only when relevant, mitigating the constant-allocation cost of always-loaded tool definitions.
Agent: An AI system that uses an observe-think-act loop with tool calling to autonomously accomplish tasks.
AGENTS.md: An open standard markdown file that provides project-specific instructions — build steps, testing commands, coding conventions — to AI coding agents, readable by any tool in a growing cross-vendor ecosystem.
Array: A contiguous block of memory storing elements of the same type, accessed by integer index in O(1) time.
Attention Weight Matrix: A matrix where each entry represents how much one position attends to another, with rows forming probability distributions over the input sequence.
Attention Weights: Normalized scores that determine how much each position in a sequence contributes to the output at a given position.
Attention: A mechanism that allows neural networks to focus on relevant parts of the input when producing output.
Autograd: Automatic differentiation — a system that computes gradients by recording operations and applying the chain rule in reverse.
Backpropagation: The algorithm that computes gradients by propagating error signals backward through a computation graph using the chain rule.
Batch Normalization: A technique that normalizes activations across the batch dimension with learnable scale and shift parameters, stabilizing deep network training.
Beads: A distributed, git-backed graph issue tracker for AI coding agents that provides persistent structured memory through a dependency-aware work graph stored in a version-controlled database.
BERT: Bidirectional Encoder Representations from Transformers — an encoder-only model pre-trained on masked language modelling and next sentence prediction.
Big-O Notation: A mathematical notation describing the upper bound of an algorithm's time or space complexity as input size grows.
Bigram: A model that predicts the next token based only on the current token — the simplest possible language model, using character or word pair frequencies.
Binary Search Tree: A binary tree where each node's left subtree contains only smaller values and right subtree contains only larger values.
Binary Search: An O(log n) algorithm for finding an element in a sorted array by repeatedly halving the search space.
Bloom Filter: A space-efficient probabilistic data structure that tests whether an element is a member of a set, with possible false positives but no false negatives.
Bombadil: A property-based UI testing framework by Antithesis that autonomously explores web applications and validates correctness invariants.
Byte Pair Encoding (BPE): A subword tokenization algorithm that iteratively merges the most frequent character pairs into single tokens, building a vocabulary that balances coverage and efficiency.
Causal Masking: A mechanism that prevents attention from looking at future positions, enforcing the autoregressive property in language models.
Cell-Level Merge: A merge strategy that resolves conflicts at the individual field (cell) level rather than at the line or row level, enabling concurrent modifications to different fields of the same record without conflict.
Chain Rule: The calculus rule that computes the derivative of a composed function by multiplying the derivatives of each step — the mathematical foundation of backpropagation.
Chrome Extension: A small app that runs inside Google Chrome to add or change browser behavior.
CLAUDE.md: A project-level configuration file that provides Claude Code with persistent, auto-loaded instructions about a codebase's architecture, conventions, and workflows.
Collision-Free ID: An identifier generation strategy that uses content hashing with adaptive length scaling to minimize the chance of two independently-created items receiving the same ID.
Compaction: A lossy context management technique that summarizes or removes older messages from an agent's conversation history to free space within the context window.
Computation Graph: A directed acyclic graph that records every operation in a computation, enabling automatic gradient computation via backpropagation.
Context Engineering: The discipline of designing, managing, and optimizing the information placed into a language model's context window to maximize the quality and reliability of its output.
Context Window: The maximum number of tokens a language model can process in a single request, encompassing both the input prompt and the generated output.
Correctness Invariant: A property that must hold true at every observable state of a system, serving as a formal specification of correct behavior.
Cross-Entropy: A loss function that measures how well a predicted probability distribution matches the true distribution, widely used in classification and language modeling.
Decision Boundary: The surface in input space where a classifier's prediction changes from one class to another — what the model has 'learned' visualized geometrically.
Design Pattern: A reusable solution template for a commonly occurring problem in software design.
Differential Rendering: A rendering technique that compares the current and previous output to update only the parts of the screen that have changed.
Dilated Convolution: A convolution with gaps between kernel elements, enabling exponentially growing receptive fields without increasing parameter count — the key mechanism in WaveNet.
Dynamic Programming: An optimization technique that solves complex problems by breaking them into overlapping subproblems and caching their solutions.
Embedding: A learned dense vector representation that maps discrete tokens to continuous vector spaces.
Ephemeral Work Item: A work item that exists only in the local database, is never synchronized to remote collaborators, and is hard-deleted when its purpose is served — designed for routine operational steps that have no long-term audit value.
Exclusive Lock Protocol: A cooperative file-based locking mechanism that allows an external tool to claim exclusive management of a shared database, preventing background daemons from interfering with deterministic operations.
Finite-State Machine: A computational model with a fixed set of states and rules for transitioning between them based on inputs, used to model sequential decision-making in AI systems.
Follow-up Message: A user message queued during an agent's run that is delivered only after the agent finishes all current work, triggering another turn.
Formal Verification: Using mathematical proofs, checked by a computer, to guarantee that software or hardware behaves exactly as specified — not just testing some cases, but proving all of them.
Forward Pass: The computation that flows input data through a neural network to produce an output — building the computation graph that backpropagation will later traverse.
Git Hook: A script that git executes automatically at specific points in its workflow — such as before a commit, after a merge, or before a push — enabling custom validation, synchronization, and automation without manual intervention.
GPT: Generative Pre-trained Transformer — a family of decoder-only language models that generate text by predicting the next token.
Gradient Descent: An optimization algorithm that iteratively adjusts parameters in the direction that reduces a loss function, using gradients to determine the direction.
Gradient Highway: A direct path through a network's computation graph that carries gradients backward with a derivative of 1, preventing vanishing gradients in deep architectures.
Graph Issue Tracker: An issue tracking system where work items are nodes and relationships are edges in a directed graph, enabling dependency-aware queries like 'show me all tasks with no open blockers.'
Graph: A data structure consisting of vertices (nodes) connected by edges, used to model relationships and networks.
GRU: Gated Recurrent Unit — a simplified RNN variant with fewer gates than LSTM that achieves comparable performance with less computation.
Hash Table: A data structure that maps keys to values using a hash function for O(1) average-case lookups.
Heap: A complete binary tree satisfying the heap property — every parent is greater (max-heap) or smaller (min-heap) than its children.
Hierarchical Issue ID: A dot-notation identifier scheme where parent-child relationships are encoded directly in the ID, making work structure visible at a glance — e.g., bd-a3f8.1.1 means sub-task 1 of task 1 of epic bd-a3f8.
Hinge Loss: A loss function for binary classification that penalizes predictions lacking sufficient margin — the objective used in SVMs and micrograd's training.
Incremental Learning: An optimization property of residual networks where each layer learns a small correction to its input rather than computing the full output from scratch.
Inductive Type: A custom data type defined by listing its possible forms (constructors), allowing the programmer to create precise categories that the compiler can check exhaustively.
Inference: The process of running a trained model on new inputs to produce predictions, as opposed to training where weights are updated.
Issue Compaction: A deliberate data reduction strategy that summarizes old closed work items using AI, preserving graph structure while discarding verbose text to reduce context window consumption.
Issue Federation: A peer-to-peer synchronization protocol where independent teams each maintain their own issue database and selectively share work items through database-level remotes, preserving data sovereignty while enabling cross-team collaboration.
JSON Schema: A vocabulary for annotating and validating JSON documents, used to define tool interfaces for language models.
JSON: JavaScript Object Notation — a lightweight, text-based data interchange format that uses key-value pairs and ordered lists to represent structured data, readable by both humans and machines.
JSONL Portability Layer: A dual-format data strategy where a structured database serves as the operational source of truth while a line-delimited JSON file provides git-compatible portability, enabling issue data to travel with code across clones and forks.
LayerNorm: Layer Normalization — a technique that normalizes activations across features by centering and scaling, with learnable parameters, to stabilize deep network training.
Learning Rate: A hyperparameter that controls how large each parameter update step is during training — too high causes instability, too low causes slow convergence.
Linear Algebra: The branch of mathematics dealing with vectors, matrices, and linear transformations — the foundational language of neural networks and transformer architectures.
Linear Regression: A supervised learning algorithm that models the relationship between variables by fitting a linear equation to observed data — the conceptual ancestor of neural network training.
Linear Temporal Logic: A formal logic for reasoning about properties of sequences over time, using operators like 'always' and 'eventually' to specify what must hold throughout a system's execution.
Linked List: A linear data structure where each element (node) contains data and a pointer to the next node.
Logits: The raw, unnormalized scores a model outputs before softmax — one value per class or vocabulary token, not yet probabilities.
Loss Function: A function that quantifies how wrong a model's predictions are — the single number that training optimizes to reduce.
LRU Cache: A cache eviction policy that removes the least recently used item when the cache reaches capacity.
LSTM: Long Short-Term Memory — an RNN variant with gating mechanisms that can learn long-range dependencies in sequential data.
manifest.json: The required configuration file that defines a browser extension’s identity, capabilities, and permissions.
Mathematical Notation: A guide to the symbols, operators, and conventions used in machine learning formulas — from summation and subscripts to Greek letters and set notation.
Matrix Theory: The study of matrices and their algebraic properties — matrix multiplication, transposition, and decomposition underpin every computation in transformer models.
Model Context Protocol (MCP): An open standard that defines how AI agents connect to external data sources and tools via a client-server architecture, enabling context injection at the protocol level.
Monad: A design pattern that wraps values in a context and chains operations that may transform both the value and the context, enabling composable error handling and pipeline construction.
Monorepo: A software development strategy where multiple related packages or projects are stored in a single version-controlled repository.
Multi-Layer Perceptron: A feedforward neural network composed of stacked fully connected layers — the simplest architecture that can learn nonlinear patterns.
Natural Number: A non-negative whole number — 0, 1, 2, 3, and so on — the most basic type in both mathematics and programming, used for counting and indexing.
Neuron: The basic computational unit in a neural network — computes a weighted sum of inputs plus bias, then applies an activation function.
Next-Token Prediction: The training objective where a language model learns to predict the next token in a sequence given all preceding tokens.
NLP: Natural Language Processing — the field of AI concerned with enabling computers to understand, interpret, and generate human language.
NP-Complete: A class of computational problems that are both in NP and NP-hard — the hardest problems whose solutions can be verified in polynomial time.
One-Hot Encoding: Representing a categorical value as a binary vector with a single 1 at the corresponding index — the simplest encoding before dense embeddings.
Overfitting: When a model memorizes training data instead of learning general patterns, performing well on training examples but poorly on new inputs.
Pattern Matching: A programming technique that checks a value against a set of shapes or patterns and runs different code for each match — like a choose-your-own-adventure book for data.
Probability Theory: The mathematical framework for reasoning about uncertainty — probability distributions, conditional probability, and Bayes' theorem underpin how language models generate text.
Process: An instance of a running program with its own memory space, managed by the operating system.
Prompt Injection: An adversarial attack where malicious input manipulates a language model into performing unintended actions.
Property-Based Testing: A testing methodology where developers specify general properties that must hold for all inputs, and a test engine automatically generates diverse cases to verify them.
Provider Registry: A design pattern where interchangeable service implementations register themselves at startup and are looked up at runtime by a key.
Queue: A first-in, first-out (FIFO) data structure supporting enqueue and dequeue operations.
Ralph Wiggum Loop: A fresh-context iteration pattern for autonomous coding: each loop run reads persistent specs from disk, completes one bounded task, exits, and restarts in a clean context window.
Ready Work Query: A graph traversal query that finds all open work items with no unsatisfied blocking dependencies, answering the question every agent asks: what should I work on next?
Recursion: A technique where a function calls itself to solve smaller instances of the same problem until reaching a base case.
Regularization: Techniques that constrain a model during training to prevent overfitting — penalizing complexity so the model generalizes to unseen data.
ReLU: Rectified Linear Unit — an activation function that outputs zero for negative inputs and passes positive inputs unchanged, adding nonlinearity to neural networks.
Residual Connection: A skip connection that adds a layer's input directly to its output, creating a gradient highway that enables training of deep networks.
RMSNorm: Root Mean Square Normalization — a layer normalization variant that scales activations by their root mean square, without centering or learnable parameters.
RNN: Recurrent Neural Network — a neural network architecture that processes sequences by maintaining a hidden state across time steps.
Saga Pattern: A distributed transaction pattern that splits a long-running operation into a sequence of local transactions, each with a compensating action that undoes its effect if a later step fails.
Snake Case: A naming convention that joins words with underscores and uses all lowercase letters, commonly used for database columns, Python identifiers, and tool names in AI systems.
Softmax Normalization: A function that converts a vector of real numbers into a probability distribution, used in attention mechanisms to produce weights that sum to 1.
Sorting Algorithm: An algorithm that arranges elements of a collection into a specific order, typically ascending or descending.
Space Complexity: A measure of how much additional memory an algorithm requires relative to the size of its input.
Stack: A last-in, first-out (LIFO) data structure supporting push and pop operations.
Steering Message: A user message injected into an agent's conversation mid-run that interrupts remaining tool executions and redirects the agent's behavior.
Stochastic Gradient Descent: The simplest optimization algorithm for neural networks — compute the gradient, take a step downhill, repeat.
Streaming: A technique where a language model sends its response incrementally as it is generated, rather than waiting for the complete output.
Sub-Agent: A child agent spawned by a parent agent into a fresh context window, enabling memory isolation — the equivalent of a managed runtime (JVM, CLR) that provides a disposable heap for a bounded unit of work.
Swarm Analysis: A technique for computing the maximum number of agents that can productively work in parallel on a project by grouping tasks into dependency waves — sets of work items that share the same depth in the dependency graph.
System Design: The process of defining the architecture, components, and interfaces of a system to satisfy specified requirements at scale.
System Prompt: A privileged instruction message placed at the beginning of a language model's context window that defines the model's behavior, personality, and constraints for the entire conversation.
Tactic: A command used inside a proof assistant to transform the current proof goal into simpler subgoals, working step by step until the proof is complete.
Tanh: Hyperbolic tangent — an activation function that squashes inputs to the range [-1, 1], providing nonlinearity in neural networks.
TCP: Transmission Control Protocol — a reliable, connection-oriented transport protocol that guarantees ordered delivery of data.
Technical Debt: A software engineering metaphor where shortcuts taken to deliver value quickly create future rework costs that accumulate interest over time, analogous to financial debt.
Temperature (Sampling): A scalar that controls the randomness of text generation by sharpening or flattening the probability distribution over tokens.
Theorem Prover: A program that checks whether mathematical statements are true by verifying proofs — like a teacher that never makes mistakes and never gets tired.
Thread: A lightweight unit of execution within a process that shares the process's memory space.
Time Complexity: A measure of how the running time of an algorithm grows relative to the size of its input.
Tokenizer: A component that converts raw text into a sequence of integer token IDs and back, defining the vocabulary a language model operates on.
TOML: Tom's Obvious, Minimal Language — a configuration file format that uses explicit section headers and simple key-value syntax, designed to be unambiguous and easy to parse.
Tool Calling: A mechanism that allows large language models to invoke external functions and APIs by generating structured requests.
Transformer: A neural network architecture based on self-attention mechanisms, introduced in 'Attention Is All You Need' (2017).
Trie: A tree-like data structure for storing strings where each node represents a character, enabling O(m) prefix-based lookups.
Type System: A set of rules that assigns types to program expressions and checks them for consistency, catching entire classes of errors before the code runs.
Unit Test: An automated test that verifies the behavior of a single function or method in isolation.
Vector: An ordered list of numbers representing a point or direction in a multi-dimensional space — the fundamental data structure that neural networks operate on.
Version-Controlled Database: A database that provides git-style operations — commit, branch, merge, diff, and time-travel — on SQL tables, treating data changes as first-class version history.
Weight Initialization: The strategy for setting initial parameter values before training — critical for ensuring healthy gradient flow and preventing activation saturation in deep networks.
Workflow Formula: A declarative template that compiles into a hierarchy of work items with typed dependencies, enabling reusable multi-step workflows with variables, conditional steps, and aspect-oriented composition.
YAML: YAML Ain't Markup Language — a human-friendly data serialization format that uses indentation-based nesting and minimal punctuation, widely used for configuration files.