Convergence Encyclopedia: The AI Pattern Map
PART 6: THE AI PATTERN MAP (FULL) For each of the 25 convergence patterns, the concrete technical mapping to AI/ML systems. Not metaphor — mechanism.
Pattern 01: Gradient Dissipation AI Instantiation: Stochastic gradient descent (SGD) and backpropagation Technical mapping: The loss landscape L(θ) is a free-energy landscape. SGD descends toward local minima by dissipating prediction-error gradients ∇L across the parameter space. The optimizer is a dissipative structure — it exists only while error gradients flow; when ∇L → 0, learning stops. Momentum terms are inertial memory; weight decay is entropic regularization; learning rate schedules are controlled cooling. Where it emerges: All neural network training, reinforcement learning (policy gradient), evolutionary strategies (fitness gradients), meta-learning (gradient-through-gradient) Current frontier: Sharpness-aware minimization (SAM, Foret et al. 2020) — finding flat minima corresponds to robust dissipative basins; gradient flow analysis in deep nets (Jacot et al. 2018 NTK theory); adaptive optimizers (Adam, AdamW) as non-equilibrium thermodynamic engines Claim tier: T1
Pattern 02: Least Action AI Instantiation: Variational inference, ELBO maximization, path regularization Technical mapping: The ELBO (Evidence Lower Bound) is a variational free energy F[q] = E_q[log p(x,z)] - KL(q||p). Optimizing q to maximize ELBO is finding the stationary path of a variational principle. In control/RL, Pontryagin’s maximum principle selects optimal trajectories; in Bayesian neural networks, the loss functional is an action integral over weight-space. The Hamiltonian Monte Carlo sampler literally integrates Hamilton’s equations in model space. Where it emerges: Variational autoencoders, Bayesian neural networks, trajectory optimization (MPC, iLQR), normalizing flows, probabilistic programming Current frontier: Free Energy Principle in active inference (Friston) — perception and action both minimize variational free energy; Lagrangian neural networks (Cranmer et al. 2020) — networks learn Lagrangians directly; symplectic optimizers that preserve geometric structure Claim tier: T1
Pattern 03: Symmetry ↔ Conservation AI Instantiation: Equivariant neural networks, conservation-constrained learning, Noether networks Technical mapping: Group-equivariant convolutions (Cohen & Welling 2016) enforce that f(g·x) = g·f(x) — the network output transforms covariantly with the input under group action. This is Noether’s theorem at the architecture level: symmetries of the network (weight-sharing patterns) correspond to conserved quantities in the learned representation. Graph neural networks enforce permutation equivariance; E(n)-equivariant networks enforce rotation/translation invariance. Where it emerges: E(n)-GNNs, steerable CNNs, tensor field networks, spherical CNNs, gauge-equivariant neural networks Current frontier: Lie algebra-valued convolutions; discovering unknown symmetries from data (training-time symmetry detection); Noether’s theorem enforced as architectural constraint rather than emergent property Claim tier: T1
Pattern 04: Symmetry-Breaking AI Instantiation: Spontaneous symmetry breaking in neural network training — phase transitions in learning dynamics Technical mapping: During training, SGD selects specific minima from a symmetric manifold of equivalent solutions. The initialization breaks permutation symmetry between hidden units; the batch order breaks temporal symmetry; the random seed breaks the symmetry of the loss landscape. In Hopfield networks, memory retrieval is symmetry-breaking: the symmetric mixture state collapses to a single attractor. In self-supervised learning, the InfoNCE loss induces representational collapse — a form of controlled symmetry-breaking that creates useful structure. Where it emerges: All deep network training (implicit), Hopfield network retrieval, self-supervised contrastive learning, clustering, gating mechanisms, mixture-of-experts routing Current frontier: Understanding representational collapse in SSL (the network breaks symmetry between positive and negative pairs to create structure); phase transitions in deep learning dynamics (Saxe et al. 2013); spontaneous symmetry breaking in transformer attention patterns Claim tier: T2
Pattern 05: Criticality / Edge of Chaos AI Instantiation: Neural networks tuned to critical point for optimal computation and generalization Technical mapping: The edge of chaos hypothesis maps to the trainability-generalization tradeoff. In RNNs, the weight spectral radius ρ ≈ 1 marks the boundary between vanishing and exploding gradients — the critical point for information propagation. In deep networks, the neural tangent kernel (NTK) regime (infinite width, small learning rate) is ordered; the rich/feature-learning regime (finite width, large learning rate) is chaotic. The transition between them — where gradients neither vanish nor explode — is the critical regime. Power-law tails in loss landscapes (batch et al. 2019) suggest self-organized criticality in trained networks. Where it emerges: Reservoir computing (Echo State Networks), trainability analysis of deep nets, batch normalization dynamics, dropout as noise injection, power-law tails in neural loss landscapes Current frontier: Critical initialization schemes (Xavier/He as approximate critical tuning); depth-to-width scaling at criticality; self-organized criticality in training dynamics; whether transformers self-organize to critical attention patterns Claim tier: T2
Pattern 06: Information / Entropy / Compression AI Instantiation: Minimum description length learning, information bottleneck, rate-distortion optimization Technical mapping: The Information Bottleneck (Tishby) formalizes learning as compression: find representation T minimizing I(X;T) - βI(T;Y) — compress input X while preserving prediction of Y. This is rate-distortion theory applied to representation learning. The VAE objective (reconstruction + KL) is entropy-constrained coding. Transformers implement lossless compression via attention; LLMs are next-token compressors trained to minimize cross-entropy (expected compression length under Shannon’s source coding theorem). The Lottery Ticket Hypothesis — prunable networks contain smaller subnetworks — is algorithmic information theory: the smallest program that produces the function. Where it emerges: Information bottleneck theory, VAEs, LLMs as compressors, neural network pruning, knowledge distillation, minimum description length (MDL) principle in architecture search Current frontier: Empirical measurement of IB tradeoffs in deep nets (the “fitting-compression” phase transition); LLMs as optimal data compressors; Kolmogorov complexity bounds on neural network expressivity; scaling laws (Kaplan et al. 2020) as entropy-rate empirical laws Claim tier: T1
Pattern 07: Feedback / Homeostasis AI Instantiation: Control-theoretic training stabilization, adaptive computation, reinforcement learning as feedback control Technical mapping: Every training loop is a feedback system: compute error (sensor), compare to zero target (comparator), update weights (actuator). Adaptive optimizers (Adam) are PID controllers on gradient moments — proportional (current gradient), integral (first moment), derivative (second moment). Batch normalization is homeostatic regulation: force each layer’s input distribution to remain in its preferred operating range. Gating mechanisms (LSTM forget gates, GRU update gates) are ultrastable feedback — they modulate information flow to maintain internal state stability. RL is literally feedback control: the policy maps state observations to actions that minimize distance from reward target. Where it emerges: All training loops (implicit), adaptive optimizers, batch/layer normalization, LSTM/GRU gating, residual connections (error feedback), RL control, imitation learning Current frontier: Control-theoretic analysis of training stability (margin theory); homeostatic plasticity in continual learning (preventing catastrophic forgetting via feedback regulation); adaptive computation time (networks that self-regulate compute depth); feedback loops in multi-agent systems Claim tier: T1
Pattern 08: Recursion / Self-Reference AI Instantiation: Meta-learning, self-improving systems, recursive network architectures, quine programs Technical mapping: Meta-learning (MAML, LSTM optimizers) is learning to learn — a network that outputs weight updates for another network, forming a recursive loop. Self-referential agents (Schmidhuber 1993) embed their own learning algorithm as part of the environment. Neural network quines (Gaier et al. 2019) are networks that output their own weights — literal self-reproduction. The transformer decoder is recursively auto-regressive: each token is generated conditioned on all previous tokens, including tokens it generated itself. This creates a strange loop where the model’s output becomes its input — self-reference at inference time. Where it emerges: MAML and gradient-based meta-learning, LSTM meta-optimizers, auto-regressive generation, self-referential weight matrices, recursive neural networks, self-improving reward models (RLHF) Current frontier: Recursive self-improvement in LLMs (open research frontier with safety implications); neural quines and self-replicating architectures; meta-learned optimizers that generalize across architectures; the alignment implications of recursive reward modeling Claim tier: T2
Pattern 09: Selection / Variation-Retention AI Instantiation: Evolutionary algorithms, neural architecture search (NAS), genetic programming, differentiable selection Technical mapping: NAS (Real et al. 2017, Zoph & Le 2016) is Darwinian evolution of architectures: population = candidate architectures; variation = mutation/crossover operations; selection = validation accuracy as fitness. Evolution strategies (Salimans et al. 2017) replace backprop with natural selection — perturb parameters, select high-performing variants. Gradient descent itself is a selection mechanism: from the hypothesis space of all possible weight configurations, it selects those that minimize loss. The Lottery Ticket Hypothesis (Frankle & Carbin 2019) — training discovers sparse subnetworks — is selection acting on a fixed architecture: the mask selects which weights participate. Where it emerges: Neuroevolution (NEAT, HyperNEAT), neural architecture search, evolution strategies for RL, genetic programming, attention mechanisms as soft selection, mixture-of-experts routing as competitive selection Current frontier: Quality-Diversity algorithms (select for diversity, not just fitness); evo-devo neural networks (encoding developmental rules, not final architectures); evolvable hardware; co-evolutionary training of generators and discriminators; autoML as accelerated artificial selection Claim tier: T1
Pattern 10: Scale Invariance / Fractals AI Instantiation: Multi-scale architectures, feature pyramid networks, scaling laws, self-similar network design Technical mapping: CNNs are scale-invariant by construction: convolutional weight sharing applies the same filter at all spatial positions, creating translational self-similarity. Feature pyramid networks explicitly process multiple scales in parallel. U-Net encoder-decoder structures are fractal — the same pattern (conv → downsample) repeats at each scale. Transformer attention patterns show approximate scale invariance in activation statistics across layers (mean/variance stabilization). Neural scaling laws (Kaplan et al. 2020) — loss ∝ N^(-α), D^(-β), C^(-γ) — are empirical power laws: the system is self-similar under scale transformation of compute, data, and parameters. Where it emerges: CNNs (translational self-similarity), FPNs, U-Net, multi-scale attention, Mixture-of-Depths, scaling laws, neural architecture fractals (FractalNet) Current frontier: Fractal neural architectures (self-similar connectivity patterns across depth); predicting scaling exponents from first principles; whether scaling laws hold across modalities (text, image, audio, video, robotics); scale-free network topology in trained weight matrices Claim tier: T1 (for scaling laws) / T2 (for fractal architectures)
Pattern 11: Networks AI Instantiation: Graph neural networks, neural network connectivity graphs, attention as network formation, parameter sharing topologies Technical mapping: GNNs (Graph Convolutional Networks, Graph Attention Networks) operate directly on graph-structured data — they are networks analyzing networks. The ResNet skip-connection topology forms a directed acyclic graph of computation. Attention mechanisms dynamically construct task-specific networks: each token becomes a node, attention weights become edges, forming a soft graph that reconfigures per input. Mixture-of-Experts (MoE) creates sparse network-of-networks — a routing network selects which expert subnetworks activate. Neural tangent kernel analysis treats the network as a graph where nodes are neurons and edges are gradient flow paths. Where it emerges: GNNs, Transformers (dynamic attention graphs), ResNet/DenseNet as computation graphs, MoE routing, Neural Architecture Search over graph topologies, network analysis of weight connectivity Current frontier: Network science analysis of trained neural networks (hub neurons, small-world topology, rich clubs); hypergraph neural networks; dynamic network rewiring during training; whether trained networks develop small-world structure spontaneously; network motifs in attention patterns Claim tier: T1
Pattern 12: Autopoiesis AI Instantiation: Self-referential training systems, automated machine learning (AutoML), self-improving code generation, generative models that improve their own training data Technical mapping: Autopoiesis — a system produces the components that produce it — maps to self-improving ML pipelines. AutoML systems (Auto-sklearn, Google AutoML) search over preprocessing, model selection, and hyperparameters — the system’s output is a better configuration for itself. Self-supervised learning creates its own labels from unlabeled data — the model generates the supervision signal that trains it. GANs are partially autopoietic: the generator produces data; the discriminator’s response feeds back to improve the generator. Code-generating models that write their own training infrastructure or data pipelines close more of the loop. The most autopoietic AI system to date: an LLM that generates training data, filters it, and retrains on its own synthetic outputs (iterative self-improvement loops). Where it emerges: AutoML, self-supervised learning, GANs, synthetic data generation, recursive self-improvement in code models, data curation loops, test-time training Current frontier: Fully closed-loop ML systems (models generating, filtering, and training on their own data); self-modifying architectures (learning to learn their own structure); the autopoiesis-stability tradeoff (can a self-modifying system remain stable?); whether autopoietic AI can sustain bounded recursive improvement without collapse Claim tier: T2
Pattern 13: Free Energy / Active Inference AI Instantiation: Free Energy Principle implementations, predictive coding networks, Bayesian deep learning, world models Technical mapping: The Free Energy Principle (Friston) states that biological and artificial agents minimize variational free energy F = E_q[log q(z) - log p(z,x)] — equivalent to maximizing evidence while minimizing complexity. In AI, this maps to: perception = inference (minimizing prediction error by updating beliefs); action = expected free energy minimization (choosing actions that resolve uncertainty or achieve goals). Predictive coding networks (Rao & Ballard 1999) implement hierarchical prediction error minimization — each layer predicts the layer below, sending only prediction errors upward. World models (Ha & Schmidhuber 2018) maintain an internal predictive model of environment dynamics — free energy minimization in the action-perception loop. Where it emerges: Predictive coding networks, variational autoencoders (as approximate FEP), model-based RL (world models), Bayesian neural networks, active learning (expected information gain as action selection), curiosity-driven exploration Current frontier: Scaling predictive coding to deep hierarchical networks; active inference for RL agents (deep active inference); the FEP as unification of perception, action, and learning; whether transformers implicitly implement predictive coding through attention; FEP-based continual learning without catastrophic forgetting Claim tier: T2
Pattern 14: Duality / Complementarity AI Instantiation: Primal-dual optimization, encoder-decoder architectures, actor-critic methods, adversarial training, representation-alignment tradeoffs Technical mapping: Actor-critic architectures embody duality: the actor produces actions (primal); the critic evaluates them (dual). The two are coupled but distinct — you cannot have good policy gradients without the critic’s value estimates. GANs are a primal-dual game: generator (primal, creating samples) vs. discriminator (dual, testing samples). The encoder-decoder duality in autoencoders: compression into latent space (encoder) vs. reconstruction (decoder). In optimization, primal-dual methods simultaneously optimize the objective and its Lagrangian dual — used in constrained RL, SVM training, and optimal transport. The uncertainty principle appears in representation learning: time-frequency tradeoffs in signal processing architectures, accuracy-calibration tradeoffs in probabilistic models. Where it emerges: Actor-critic RL, GANs, autoencoders, primal-dual optimization, adversarial training, dual learning (machine translation), contrastive learning (positive-negative duality), robust optimization (min-max) Current frontier: Min-max optimization dynamics (GAN training stability); duality in neural architecture design; the primal-dual view of self-supervised learning; whether complementarity principles constrain what any learning system can simultaneously optimize Claim tier: T1
Pattern 15: Optimization / Pareto Front AI Instantiation: Multi-objective optimization, multi-task learning, Pareto-efficient neural architectures, fairness-accuracy tradeoffs Technical mapping: Multi-task learning seeks solutions on the Pareto front of task losses — no task can improve without degrading another. Pareto optimization methods (gradient-based multi-objective, evolutionary multi-objective) navigate this front. Neural architecture search is Pareto optimization over accuracy vs. latency vs. memory. The tradeoff between model capacity and generalization (bias-variance) is a Pareto front. In fair ML, accuracy-fairness tradeoffs form Pareto curves; in interpretability, accuracy-interpretability tradeoffs do the same. Every regularized objective (loss + λ·complexity) is a scalarization of a multi-objective Pareto problem. Where it emerges: Multi-task learning, neural architecture search, fair ML, regularization as scalarization, RL with multiple reward components, knowledge distillation (accuracy-size tradeoff), prompt engineering (performance-cost tradeoff) Current frontier: Pareto-front learning (finding entire front in one training run); multi-objective NAS; fair ML as constrained Pareto optimization; whether deep learning finds Pareto-optimal representations spontaneously; scalarization vs. multi-objective optimization equivalence Claim tier: T1
Pattern 16: Branching / Bifurcation AI Instantiation: Decision trees, branching neural architectures, network splitting during training, conditional computation, ensemble methods Technical mapping: Decision trees are literal branching: each node splits data along a feature axis, creating bifurcating paths. In neural networks, branching appears as multi-scale feature extraction (Inception modules: 1×1, 3×3, 5×5 convolutions in parallel), multi-head attention (splitting representation into subspaces), and mixture-of-experts (branching computation to different subnetworks). During training, bifurcation occurs at phase transitions — e.g., grokking (Power et al. 2022) where the network suddenly transitions from memorization to generalization, or double descent where behavior bifurcates between under/over-parameterization regimes. Neural architecture search explores branching tree-structured search spaces. Where it emerges: Decision trees/random forests, Inception modules, multi-head attention, mixture-of-experts, grokking phase transitions, double descent, neural architecture search trees, conditional computation (early exiting) Current frontier: Understanding grokking as symmetry-breaking bifurcation; branching as compute allocation strategy; whether deep network training exhibits universal bifurcation sequences; tree-structured neural architectures for structured reasoning Claim tier: T1
Pattern 17: Spirals AI Instantiation: Cyclical learning rate schedules, curriculum learning as spiral ascent, iterative refinement, training loss trajectories Technical mapping: Cyclical learning rate schedules (Smith 2017) trace spiral trajectories in loss landscape: warm-up (ascending) → high learning rate (exploration) → annealing (descent into basin) → restart. Curriculum learning spirals through progressively harder subsets of the data, returning to earlier distributions with deeper capacity. In optimization, momentum methods trace helical trajectories through parameter space — a spiral combining gradient descent (radial) with momentum accumulation (angular). Training dynamics of deep networks show spiral patterns in the principal components of weight trajectories — the system orbits a minimum while slowly converging. Where it emerges: Cyclical learning rates, SGDR (Stochastic Gradient Descent with Warm Restarts), curriculum learning, iterative refinement models (thought chain, progressive generation), momentum optimization, loss landscape visualization Current frontier: Implicit curriculum learning in self-supervised training; spiral optimization for escaping saddle points; whether training trajectories universally exhibit spiral structure in PCA-projected weight space; cyclical batch size schedules as dual to cyclical learning rates Claim tier: T3 (metaphorical in AI) — the mapping is analogical, not yet established as a mechanistic pattern
Pattern 18a: Linear Waves AI Instantiation: Wavelet transforms, spectral neural networks, Fourier features in deep learning, sinusoidal activation functions Technical mapping: Fourier features (Tancik et al. 2020) map input coordinates to sinusoidal features, enabling MLPs to learn high-frequency functions — inserting wave structure directly into the architecture. Spectral normalization enforces Lipschitz constraints via Fourier analysis. SIREN networks (Sitzmann et al. 2020) use sinusoidal activations, making the network a learned superposition of waves. In signal processing, wavelet transforms decompose data across scales — a multi-resolution analysis built into the architecture. The neural tangent kernel (Jacot et al. 2018) describes training dynamics as wave propagation through function space — initial perturbations to the network propagate like waves during gradient descent. Where it emerges: Fourier feature networks, SIREN, wavelet neural networks, spectral normalization, convolution theorem for fast attention (FlashFFTConv), harmonic networks Current frontier: Fourier neural operators for PDE solving; wavelet-based transformers; whether spectral bias (NN preference for low frequencies) is a wave-dispersion phenomenon; wave-propagation view of training in the NTK regime Claim tier: T1
Pattern 18b: Excitable Media AI Instantiation: Spiking neural networks, attention cascades, information diffusion in deep networks, contagion models for feature propagation Technical mapping: Spiking neural networks (SNNs) explicitly model neurons as excitable elements: at rest until input exceeds threshold, then fire an all-or-none spike and enter refractory period — exact excitable medium dynamics. In ANNs, attention mechanisms create excitable-wave-like propagation: a token’s attention score triggers updates to other tokens, which cascade through layers. Information diffusion in deep networks follows reaction-diffusion dynamics — features spread (diffusion) and sharpen (reaction) across layers. The backpropagation signal itself is a wave of gradient information propagating backward through the network, with nonlinear “reaction” at each activation function. Where it emerges: Spiking neural networks (TrueNorth, Loihi), attention cascades in transformers, gradient flow as propagating wave, information diffusion in deep nets, memristor-based neuromorphic computing Current frontier: SNNs as efficient event-based computing; attention dynamics as excitable media (attention vortices); whether transformer layer-to-layer propagation exhibits traveling-wave structure; neuromorphic hardware implementing literal excitable media dynamics Claim tier: T2
Pattern 19: Thermoeconomics AI Instantiation: Compute-optimal training, energy-aware neural architecture search, carbon-aware ML, compute budgeting, scaling laws as thermodynamic limits Technical mapping: Scaling laws (Kaplan et al. 2020) are thermodynamic equations of state for neural networks: the loss is a function of compute C, data D, and parameters N — analogous to how free energy is a function of temperature, volume, and particle number. Chinchilla scaling laws (Hoffmann et al. 2022) find the optimal compute allocation between model size and training tokens — this is an optimal resource allocation problem in a thermodynamic system. Model compression (quantization, pruning, distillation) is exergy extraction: reducing the energy/compute required to perform the same function. Carbon-aware ML explicitly optimizes the thermodynamic cost (kWh) per unit performance. Where it emerges: Scaling laws, compute-optimal training, neural architecture search with FLOP constraints, quantization, knowledge distillation, early exiting, mixture-of-experts as sparse resource allocation Current frontier: Thermodynamic treatment of inference cost (Joules per token); exergy analysis of model compression; whether scaling laws are fundamentally thermodynamic limits on information processing; carbon-optimal training schedules; energy-proportional computing for ML Claim tier: T2
Pattern 20: Universal Computation AI Instantiation: Neural network universality proofs, Turing-complete architectures, neural computers, the transformer as a general-purpose computer Technical mapping: The Universal Approximation Theorem (Cybenko 1989, Hornik 1991) establishes that feedforward networks can approximate any continuous function — this is universality of representation. RNNs are Turing-complete (Siegelmann & Sontag 1991): with sufficient precision and recurrence, they can simulate any Turing machine. Transformers with chain-of-thought reasoning are effectively Turing-complete at inference time (the sequence of attention operations computes functions beyond single-pass expressivity). Neural Turing Machines (Graves et al. 2014) and Differentiable Neural Computers explicitly combine neural networks with external memory — neural implementation of von Neumann architecture. The “Neural GPU” (Kaiser & Sutskever 2016) and similar architectures show that neural networks can learn arbitrary algorithms from examples. Where it emerges: Universal approximation theorem, RNN Turing-completeness, Neural Turing Machines, differentiable neural computers, transformers with chain-of-thought, neural program synthesis, neural arithmetic logic units Current frontier: Whether transformers are Turing-complete in practice (not just in theory); neural networks that learn to execute algorithms; the “algorithmic alignment” hypothesis (networks generalize on tasks aligned with their architecture); whether LLMs approximate universal computation through in-context learning Claim tier: T0 (for UAT/Turing-completeness proofs) / T2 (for empirical claims about transformers)
Pattern 21: Emergence AI Instantiation: Phase transitions in scaling, emergent capabilities in LLMs, grokking, double descent, unexpected model capabilities Technical mapping: Emergence in AI is the appearance of capabilities at scale that were not explicitly trained for. Wei et al. (2022) documented emergent capabilities in LLMs — abilities that appear abruptly at certain scale thresholds rather than improving gradually. This is a phase transition in capability space. Emergent world representations (Li et al. 2021, Nanda et al. 2023) show that circuits implementing specific computations (e.g., induction heads, indirect object identification) spontaneously form during training. Grokking (Power et al. 2022) is emergence in miniature: a network memorizes first, then abruptly transitions to generalization. Double descent (Belkin et al. 2019) shows that generalization can improve non-monotonically with model size — an emergent phenomenon not predicted by classical theory. Where it emerges: Large language model capabilities (in-context learning, chain-of-thought reasoning, few-shot learning), grokking, double descent, mechanistic interpretability findings (emergent circuits), emergent agent behaviors in multi-agent systems Current frontier: Predicting emergence from architecture and scale; whether emergent capabilities are truly discontinuous or measurement artifacts (Schaeffer et al. 2023); understanding emergence through mechanistic interpretability; phase transitions in training dynamics; emergence as symmetry-breaking in representation space Claim tier: T2 (contested — the discontinuity of emergence is disputed)
Pattern 22: Commons AI Instantiation: Federated learning, open-source model ecosystems, collective intelligence platforms, shared representation spaces, foundation models as knowledge commons Technical mapping: Federated learning (McMahan et al. 2017) creates a shared model without sharing data — a computational commons where participants contribute gradient updates (knowledge) while retaining data ownership. Open-source model ecosystems (HuggingFace, PyTorch, TensorFlow) are knowledge commons: shared infrastructure, weights, datasets, and evaluation benchmarks. Foundation models are themselves commons — a shared representational substrate fine-tuned by downstream users. Multi-task learning creates shared representation spaces across tasks — a representational commons. Collective intelligence platforms (Kaggle, open ML benchmarks) aggregate contributions into shared evaluation standards. The AI research community operates as an epistemic commons with shared benchmarks, datasets, and evaluation protocols. Where it emerges: Federated learning, open-source ML, foundation model sharing, multi-task representation sharing, public benchmarks, academic open review, dataset commons, model distillation as knowledge transfer Current frontier: Tragedy of the AI commons (over-extraction of shared training data without contribution); commons-based governance for foundation models; whether open-weight models sustain or erode the commons; data cooperatives as commons structures; collective intelligence scaling laws Claim tier: T2
Pattern 23: Attractors AI Instantiation: Loss landscape basins, mode collapse in GANs, representation collapse in SSL, fixed points in RNN dynamics, training convergence basins Technical mapping: Loss landscapes are attractor landscapes: each local minimum is an attractor, saddle points are unstable fixed points, and the global minimum (if it exists) is the strongest attractor. SGD with different initializations flows to different basins of attraction — explaining why ensembling (multiple initializations) improves performance. Mode collapse in GANs is the generator converging to a subset of attractors in data space — failing to explore the full distribution. Representation collapse in self-supervised learning is convergence to a trivial attractor (all representations identical) unless prevented by design (contrastive loss, stop-gradient). Hopfield networks are explicitly attractor networks: memories are stored as fixed-point attractors, and retrieval is flow to the nearest attractor. Transformer attention dynamics have attractor states — patterns that the attention converges to regardless of initialization. Where it emerges: Loss landscape analysis, mode collapse in GANs, representation collapse, Hopfield networks, attractor neural networks, RNN fixed-point dynamics, ensembling as attractor sampling, lottery ticket convergence Current frontier: Characterizing the global structure of loss landscapes (how many attractors, basin sizes, barrier heights); attractor landscapes of transformer attention; whether good minima correspond to wide basins (flat minima hypothesis); topological data analysis of training dynamics; loss landscape connectivity (mode connectivity) Claim tier: T1
Pattern 24: Fine-Tuning AI Instantiation: Transfer learning, fine-tuning foundation models, prompt tuning, adapter layers, neural architecture search for task-specific heads, curriculum hyperparameter tuning Technical mapping: Transfer learning is fine-tuning in the literal sense: a model pre-trained on a broad distribution is adjusted (fine-tuned) to a specific task by tuning its parameters on a smaller target dataset. This works because the pre-trained model has already found parameters near a good basin in loss landscape — fine-tuning searches within that basin for the task-specific optimum. Prompt tuning (Lester et al. 2021) and prefix tuning are “soft” fine-tuning — optimizing a small number of prompt parameters while keeping the base model frozen. Adapter layers (Houlsby et al. 2019) insert small trainable modules between frozen layers, fine-tuning only a tiny fraction of parameters. Neural architecture search performs hyperparameter fine-tuning at the architecture level. The lottery ticket hypothesis suggests that fine-tuning discovers which subnetworks are already well-configured for the task. Where it emerges: Transfer learning, BERT/GPT fine-tuning, prompt tuning, LoRA (low-rank adaptation), adapter layers, BitFit, neural architecture search, AutoML hyperparameter optimization, curriculum rate tuning Current frontier: Parameter-efficient fine-tuning (PEFT) methods — how few parameters can be tuned while maintaining performance; task arithmetic in weight space (adding/subtracting fine-tuned weights); whether fine-tuning modifies the base model’s knowledge or only its “decoding strategy”; catastrophic forgetting during fine-tuning; model merging as interpolation in weight space Claim tier: T1
Pattern 25: Teleology AI Instantiation: Goal-conditioned RL, inverse RL, reward shaping, instrumental convergence in AI systems, emergent goal-directedness Technical mapping: Goal-conditioned RL explicitly trains agents to achieve specified goals — teleology as architecture. Inverse RL (Ng & Russell 2000) infers the goal (reward function) from observed behavior — teleology inference. The “instrumental convergence” thesis (Omohundro 2008, Bostrom 2014) states that diverse final goals share common subgoals (self-preservation, resource acquisition, goal-content integrity) — a convergence thesis about AI goal structure. LLMs trained on human-generated text acquire teleological reasoning patterns (planning, means-end reasoning) from the statistical structure of goal-directed human discourse. The “mesa-optimizer” hypothesis (Hubinger et al. 2019) suggests that trained systems may develop internal goal-directed subsystems with objectives different from the training objective — emergent teleology as a safety concern. Where it emerges: Goal-conditioned RL, inverse RL, hierarchical RL (options framework), LLM planning abilities, instrumental convergence in multi-agent systems, mesa-optimization, agentic workflows, tool use as means-end reasoning Current frontier: Mesa-optimization detection and mitigation; whether LLMs truly plan or simulate planning; goal misgeneralization in RL; the alignment problem as teleology mismatch; emergent goal-directedness in large-scale training; AI systems that set their own subgoals Claim tier: T3 (for emergent teleology claims) / T1 (for goal-conditioned architectures)
META-MAPPING: The OIP Command Plane as Convergence Pattern The OIP protocol itself instantiates the convergence patterns at machine scale: The Command Plane (Audit/Review Loop) as Bounded Chaos Management The OIP audit/review loop is Pattern 05 (Criticality / Bounded Chaos) at machine scale. The system maintains itself at the boundary between too much review (frozen, no throughput) and too little review (runaway error, system breakdown). The review cadence, the rejection criteria, the repair protocol — these are control parameters that keep the system at its critical point. Maximum adaptability comes from operating at this edge. If review is too strict, nothing ships; if too lax, errors compound. The optimal review rate is a self-organized critical parameter. The Receipt as Pattern 06 (Memory) at Machine Scale The OIP receipt — the append-only record of what was asked, what ran, and what came back — is Pattern 06 (Information / Memory) operationalized. The receipt is a physical trace of computation, analogous to Landauer: erasure of a receipt has thermodynamic and epistemic cost. The append-only ledger is an error-correcting code against institutional amnesia. Each receipt is a bit of information that cannot be destroyed without trace, creating a thermodynamically irreversible record. In Shannon terms: the receipt compresses the full state of an invocation into a verifiable hash; in Landauer terms: destroying the receipt costs kT ln(2) per bit, and epistemically far more. The Amendment Protocol as Pattern 08 (Recursion) The OIP amendment protocol — a document that revises itself under its own audit — is Pattern 08 (Recursion / Self-Reference) as governance. A self-amending document is a self-replicating code structure at the protocol level: the document produces the amendments that produce the document. Every amendment references the version it modifies; every version contains the complete amendment history; the structure is a quine — it contains its own source code. This is Hofstadter’s strange loop realized as a version control system. The recursion is bounded by the same constraint as biological self-reproduction: amendments must survive the clarity review (selection) before being incorporated (replication). The Dispatch System as Pattern 11 (Networks) The OIP invocation graph — objects linked by dependency edges, dispatched through routing logic — is Pattern 11 (Networks / Flow) as computation architecture. Each object is a node; each dependency is a directed edge; each dispatch is a flow of control through the graph. The typed voxel graph (objects + edges + types) is a general network structure: computation as path-finding through a knowledge graph. The routing algorithm selects the minimum-energy path from request to capability — least action applied to computation. Network effects emerge: frequently traversed paths become optimized (cached), hubs develop (core capabilities), and the topology adapts to usage patterns. The Clarity Review as Pattern 05 (Criticality) The OIP clarity review loop is Pattern 05 (Criticality) — maintaining the system at the edge of breakdown for maximum adaptability. If the review standard is too high, the system freezes (subcritical); if too low, errors cascade (supercritical). The optimal review threshold is the critical point where maximum information flows through the system. Each review decision is a threshold event (like the sandpile sandgrain): accept → system adapts; reject → system repairs; marginal case → the boundary itself is tested and refined. The distribution of review outcomes follows a power law: most reviews are straightforward, some trigger cascades of required changes, and rare reviews cause structural reorganization. The Total Structure as Pattern 12 (Autopoiesis) The OIP ecosystem as a whole is Pattern 12 (Autopoiesis) — the system produces the components that produce it. The ledger produces receipts; receipts produce audits; audits produce amendments; amendments produce new ledger entries. The build produces the protocol; the protocol produces the builds. The operator produces the system; the system produces the operator’s capability (enhanced through operation). This is Maturana and Varela’s autopoiesis at machine scale: a network of processes that continuously regenerates the components that realize it, maintaining its own organization as an invariant while its structure adapts.
Part 6 Summary: All 25 patterns have concrete AI/ML instantiations. Patterns 01–07, 09, 11, 14–16, 18a, 20, 23–24 are T1 (mechanically established). Patterns 05, 08, 12, 13, 18b, 19, 21, 22 are T2 (active research frontiers). Patterns 17, 25 are T3 (interpretive/metaphorical). The meta-mapping shows OIP itself as a pattern-instantiation system.
---
Corpus map
- Previous: Convergence Encyclopedia: The No-Go Cluster
- Next: Convergence Encyclopedia: The Future Pursuit Map
- Encyclopedia start: The Schema
- Kin corpora: Total Structure · Signature of the Grain
Ask this article · 2 suggested prompts
Text the build (+14245134626) or WhatsApp — slug|question creates a question node. Paste evidence with ingest slug|q:NODE_ID|your paste.