Everyone's talking about AI adoption. Almost nobody has the real numbers. Help us change that — and get the full report 👉 Engineers | Leaders

glossary

Key terms and definitions


Activation Function
A nonlinear function applied after a neuron's weighted sum — without it, stacking layers would collapse into a single linear transformation.
Adam Optimizer
An adaptive learning rate optimizer that maintains per-parameter momentum and squared gradient estimates with bias correction.
Agent Backpressure
Automated feedback mechanisms — type systems, test suites, linters, and pre-commit hooks — that allow AI agents to detect and correct their own mistakes without human intervention.
Agent Harness
The orchestration layer around a language model that manages prompts, tool execution, policy checks, and loop control for autonomous agent behavior.
Agent Heartbeat
A liveness detection mechanism where an agent periodically updates a timestamp in a shared database, allowing a monitoring system to detect crashed agents and automatically reassign their uncompleted work.
Agent Skills
Lazily-evaluated instruction sets that an agent loads into its context window only when relevant, mitigating the constant-allocation cost of always-loaded tool definitions.
Agent
An AI system that uses an observe-think-act loop with tool calling to autonomously accomplish tasks.
AGENTS.md
An open standard markdown file that provides project-specific instructions — build steps, testing commands, coding conventions — to AI coding agents, readable by any tool in a growing cross-vendor ecosystem.
Attention Weight Matrix
A matrix where each entry represents how much one position attends to another, with rows forming probability distributions over the input sequence.
Attention Weights
Normalized scores that determine how much each position in a sequence contributes to the output at a given position.
Attention
A mechanism that allows neural networks to focus on relevant parts of the input when producing output.
Autograd
Automatic differentiation — a system that computes gradients by recording operations and applying the chain rule in reverse.
Backpropagation
The algorithm that computes gradients by propagating error signals backward through a computation graph using the chain rule.
Batch Normalization
A technique that normalizes activations across the batch dimension with learnable scale and shift parameters, stabilizing deep network training.
Beads
A distributed, git-backed graph issue tracker for AI coding agents that provides persistent structured memory through a dependency-aware work graph stored in a version-controlled database.
BERT
Bidirectional Encoder Representations from Transformers — an encoder-only model pre-trained on masked language modelling and next sentence prediction.
Bigram
A model that predicts the next token based only on the current token — the simplest possible language model, using character or word pair frequencies.
Bombadil
A property-based UI testing framework by Antithesis that autonomously explores web applications and validates correctness invariants.
Byte Pair Encoding (BPE)
A subword tokenization algorithm that iteratively merges the most frequent character pairs into single tokens, building a vocabulary that balances coverage and efficiency.
Causal Masking
A mechanism that prevents attention from looking at future positions, enforcing the autoregressive property in language models.
Cell-Level Merge
A merge strategy that resolves conflicts at the individual field (cell) level rather than at the line or row level, enabling concurrent modifications to different fields of the same record without conflict.
Chain Rule
The calculus rule that computes the derivative of a composed function by multiplying the derivatives of each step — the mathematical foundation of backpropagation.
Chrome Extension
A small app that runs inside Google Chrome to add or change browser behavior.
CLAUDE.md
A project-level configuration file that provides Claude Code with persistent, auto-loaded instructions about a codebase's architecture, conventions, and workflows.
Collision-Free ID
An identifier generation strategy that uses content hashing with adaptive length scaling to minimize the chance of two independently-created items receiving the same ID.
Compaction
A lossy context management technique that summarizes or removes older messages from an agent's conversation history to free space within the context window.
Computation Graph
A directed acyclic graph that records every operation in a computation, enabling automatic gradient computation via backpropagation.
Context Engineering
The discipline of designing, managing, and optimizing the information placed into a language model's context window to maximize the quality and reliability of its output.
Context Window
The maximum number of tokens a language model can process in a single request, encompassing both the input prompt and the generated output.
Correctness Invariant
A property that must hold true at every observable state of a system, serving as a formal specification of correct behavior.
Cross-Entropy
A loss function that measures how well a predicted probability distribution matches the true distribution, widely used in classification and language modeling.
Decision Boundary
The surface in input space where a classifier's prediction changes from one class to another — what the model has 'learned' visualized geometrically.
Differential Rendering
A rendering technique that compares the current and previous output to update only the parts of the screen that have changed.
Dilated Convolution
A convolution with gaps between kernel elements, enabling exponentially growing receptive fields without increasing parameter count — the key mechanism in WaveNet.
Embedding
A learned dense vector representation that maps discrete tokens to continuous vector spaces.
Ephemeral Work Item
A work item that exists only in the local database, is never synchronized to remote collaborators, and is hard-deleted when its purpose is served — designed for routine operational steps that have no long-term audit value.
Exclusive Lock Protocol
A cooperative file-based locking mechanism that allows an external tool to claim exclusive management of a shared database, preventing background daemons from interfering with deterministic operations.
Finite-State Machine
A computational model with a fixed set of states and rules for transitioning between them based on inputs, used to model sequential decision-making in AI systems.
Follow-up Message
A user message queued during an agent's run that is delivered only after the agent finishes all current work, triggering another turn.
Forward Pass
The computation that flows input data through a neural network to produce an output — building the computation graph that backpropagation will later traverse.
Git Hook
A script that git executes automatically at specific points in its workflow — such as before a commit, after a merge, or before a push — enabling custom validation, synchronization, and automation without manual intervention.
GPT
Generative Pre-trained Transformer — a family of decoder-only language models that generate text by predicting the next token.
Gradient Descent
An optimization algorithm that iteratively adjusts parameters in the direction that reduces a loss function, using gradients to determine the direction.
Gradient Highway
A direct path through a network's computation graph that carries gradients backward with a derivative of 1, preventing vanishing gradients in deep architectures.
Graph Issue Tracker
An issue tracking system where work items are nodes and relationships are edges in a directed graph, enabling dependency-aware queries like 'show me all tasks with no open blockers.'
GRU
Gated Recurrent Unit — a simplified RNN variant with fewer gates than LSTM that achieves comparable performance with less computation.
Hierarchical Issue ID
A dot-notation identifier scheme where parent-child relationships are encoded directly in the ID, making work structure visible at a glance — e.g., bd-a3f8.1.1 means sub-task 1 of task 1 of epic bd-a3f8.
Hinge Loss
A loss function for binary classification that penalizes predictions lacking sufficient margin — the objective used in SVMs and micrograd's training.
Incremental Learning
An optimization property of residual networks where each layer learns a small correction to its input rather than computing the full output from scratch.
Inference
The process of running a trained model on new inputs to produce predictions, as opposed to training where weights are updated.
Issue Compaction
A deliberate data reduction strategy that summarizes old closed work items using AI, preserving graph structure while discarding verbose text to reduce context window consumption.
Issue Federation
A peer-to-peer synchronization protocol where independent teams each maintain their own issue database and selectively share work items through database-level remotes, preserving data sovereignty while enabling cross-team collaboration.
JSON Schema
A vocabulary for annotating and validating JSON documents, used to define tool interfaces for language models.
JSON
JavaScript Object Notation — a lightweight, text-based data interchange format that uses key-value pairs and ordered lists to represent structured data, readable by both humans and machines.
JSONL Portability Layer
A dual-format data strategy where a structured database serves as the operational source of truth while a line-delimited JSON file provides git-compatible portability, enabling issue data to travel with code across clones and forks.
LayerNorm
Layer Normalization — a technique that normalizes activations across features by centering and scaling, with learnable parameters, to stabilize deep network training.
Learning Rate
A hyperparameter that controls how large each parameter update step is during training — too high causes instability, too low causes slow convergence.
Linear Algebra
The branch of mathematics dealing with vectors, matrices, and linear transformations — the foundational language of neural networks and transformer architectures.
Linear Regression
A supervised learning algorithm that models the relationship between variables by fitting a linear equation to observed data — the conceptual ancestor of neural network training.
Linear Temporal Logic
A formal logic for reasoning about properties of sequences over time, using operators like 'always' and 'eventually' to specify what must hold throughout a system's execution.
Logits
The raw, unnormalized scores a model outputs before softmax — one value per class or vocabulary token, not yet probabilities.
Loss Function
A function that quantifies how wrong a model's predictions are — the single number that training optimizes to reduce.
LSTM
Long Short-Term Memory — an RNN variant with gating mechanisms that can learn long-range dependencies in sequential data.
manifest.json
The required configuration file that defines a browser extension’s identity, capabilities, and permissions.
Mathematical Notation
A guide to the symbols, operators, and conventions used in machine learning formulas — from summation and subscripts to Greek letters and set notation.
Matrix Theory
The study of matrices and their algebraic properties — matrix multiplication, transposition, and decomposition underpin every computation in transformer models.
Model Context Protocol (MCP)
An open standard that defines how AI agents connect to external data sources and tools via a client-server architecture, enabling context injection at the protocol level.
Monad
A design pattern that wraps values in a context and chains operations that may transform both the value and the context, enabling composable error handling and pipeline construction.
Monorepo
A software development strategy where multiple related packages or projects are stored in a single version-controlled repository.
Multi-Layer Perceptron
A feedforward neural network composed of stacked fully connected layers — the simplest architecture that can learn nonlinear patterns.
Neuron
The basic computational unit in a neural network — computes a weighted sum of inputs plus bias, then applies an activation function.
Next-Token Prediction
The training objective where a language model learns to predict the next token in a sequence given all preceding tokens.
NLP
Natural Language Processing — the field of AI concerned with enabling computers to understand, interpret, and generate human language.
One-Hot Encoding
Representing a categorical value as a binary vector with a single 1 at the corresponding index — the simplest encoding before dense embeddings.
Overfitting
When a model memorizes training data instead of learning general patterns, performing well on training examples but poorly on new inputs.
Probability Theory
The mathematical framework for reasoning about uncertainty — probability distributions, conditional probability, and Bayes' theorem underpin how language models generate text.
Prompt Injection
An adversarial attack where malicious input manipulates a language model into performing unintended actions.
Property-Based Testing
A testing methodology where developers specify general properties that must hold for all inputs, and a test engine automatically generates diverse cases to verify them.
Provider Registry
A design pattern where interchangeable service implementations register themselves at startup and are looked up at runtime by a key.
Ralph Wiggum Loop
A fresh-context iteration pattern for autonomous coding: each loop run reads persistent specs from disk, completes one bounded task, exits, and restarts in a clean context window.
Ready Work Query
A graph traversal query that finds all open work items with no unsatisfied blocking dependencies, answering the question every agent asks: what should I work on next?
Regularization
Techniques that constrain a model during training to prevent overfitting — penalizing complexity so the model generalizes to unseen data.
ReLU
Rectified Linear Unit — an activation function that outputs zero for negative inputs and passes positive inputs unchanged, adding nonlinearity to neural networks.
Residual Connection
A skip connection that adds a layer's input directly to its output, creating a gradient highway that enables training of deep networks.
RMSNorm
Root Mean Square Normalization — a layer normalization variant that scales activations by their root mean square, without centering or learnable parameters.
RNN
Recurrent Neural Network — a neural network architecture that processes sequences by maintaining a hidden state across time steps.
Snake Case
A naming convention that joins words with underscores and uses all lowercase letters, commonly used for database columns, Python identifiers, and tool names in AI systems.
Softmax Normalization
A function that converts a vector of real numbers into a probability distribution, used in attention mechanisms to produce weights that sum to 1.
Steering Message
A user message injected into an agent's conversation mid-run that interrupts remaining tool executions and redirects the agent's behavior.
Stochastic Gradient Descent
The simplest optimization algorithm for neural networks — compute the gradient, take a step downhill, repeat.
Streaming
A technique where a language model sends its response incrementally as it is generated, rather than waiting for the complete output.
Sub-Agent
A child agent spawned by a parent agent into a fresh context window, enabling memory isolation — the equivalent of a managed runtime (JVM, CLR) that provides a disposable heap for a bounded unit of work.
Swarm Analysis
A technique for computing the maximum number of agents that can productively work in parallel on a project by grouping tasks into dependency waves — sets of work items that share the same depth in the dependency graph.
System Prompt
A privileged instruction message placed at the beginning of a language model's context window that defines the model's behavior, personality, and constraints for the entire conversation.
Tanh
Hyperbolic tangent — an activation function that squashes inputs to the range [-1, 1], providing nonlinearity in neural networks.
Technical Debt
A software engineering metaphor where shortcuts taken to deliver value quickly create future rework costs that accumulate interest over time, analogous to financial debt.
Temperature (Sampling)
A scalar that controls the randomness of text generation by sharpening or flattening the probability distribution over tokens.
Tokenizer
A component that converts raw text into a sequence of integer token IDs and back, defining the vocabulary a language model operates on.
TOML
Tom's Obvious, Minimal Language — a configuration file format that uses explicit section headers and simple key-value syntax, designed to be unambiguous and easy to parse.
Tool Calling
A mechanism that allows large language models to invoke external functions and APIs by generating structured requests.
Transformer
A neural network architecture based on self-attention mechanisms, introduced in 'Attention Is All You Need' (2017).
Type System
A set of rules that assigns types to program expressions and checks them for consistency, catching entire classes of errors before the code runs.
Vector
An ordered list of numbers representing a point or direction in a multi-dimensional space — the fundamental data structure that neural networks operate on.
Version-Controlled Database
A database that provides git-style operations — commit, branch, merge, diff, and time-travel — on SQL tables, treating data changes as first-class version history.
Weight Initialization
The strategy for setting initial parameter values before training — critical for ensuring healthy gradient flow and preventing activation saturation in deep networks.
Workflow Formula
A declarative template that compiles into a hierarchy of work items with typed dependencies, enabling reusable multi-step workflows with variables, conditional steps, and aspect-oriented composition.
YAML
YAML Ain't Markup Language — a human-friendly data serialization format that uses indentation-based nesting and minimal punctuation, widely used for configuration files.