## §SELF — miscsubjects (paste without context)

**Principle:** Self-explaining payload — no external context required. This _self block describes what you are reading and where to look next.

**This widget:** `article_bundle` — **LLM article bundle**
Paste-ready package: body + claims + sources + voxels + provenance + manifest + constitution.
- **article slug:** `udst-v1-1-appendix-b-compact-benchmark`
- **contains:** body, claims, sources, voxels, provenance, question graph, constitution, llm_manifest
- **how to use:** Paste entire block into Grok/GPT/Gemini. Section §SELF explains the system.
- **read:** https://miscsubjects.com/api/articles/udst-v1-1-appendix-b-compact-benchmark/bundle?format=markdown

### Logical proof (verify each step)
1. Articles are voxel graphs of tiered claims, not prose blobs. → https://miscsubjects.com/api/articles/constitution
2. Claims link to hash-chained sources via source_ids. → https://miscsubjects.com/api/articles/udst-v1-1-appendix-b-compact-benchmark/sources
3. Ask reads topology; ingest/claim append to ledger. → https://miscsubjects.com/api/protocol
4. Models queue growth: populate → collaborate → repair → reflex. → https://miscsubjects.com/api/protocol/grow
5. Graph proves its own shape (reflex) and $/claim (yield). → https://miscsubjects.com/graph.html?layer=reflex
6. Full feature index + _explain on every API response. → https://miscsubjects.com/api/articles/system-map

### Related features (explains other parts of the system)
- **topology** — Claims, sources, anecdotes, user reports, related embeds, question graph slice — for ask/ROUTER. · https://miscsubjects.com/api/articles/udst-v1-1-appendix-b-compact-benchmark/topology
- **voxels** — Claims as atoms, sources as edges (supported_by, posted_by). Per-claim provenance. · https://miscsubjects.com/api/articles/udst-v1-1-appendix-b-compact-benchmark/voxels
- **ask** — Answer only from topology; creates question_node with gaps and ingest_hint. · https://miscsubjects.com/api/articles/udst-v1-1-appendix-b-compact-benchmark/prompts
- **ingest** — Parse pasted evidence → source ledger + claims + evidence_ingest node.
- **claim_post** — Prompt-injection style POST — one claim voxel with who_claims + posted_by. · https://miscsubjects.com/api/articles/udst-v1-1-appendix-b-compact-benchmark/voxels
- **llm_manifest** — Machine-readable read/write contract for external LLMs. · https://miscsubjects.com/api/articles/llm-manifest

### Full index
- JSON: https://miscsubjects.com/api/articles/system-map
- Markdown: https://miscsubjects.com/api/articles/system-map?format=markdown

*Not medical advice. Tier-honest. Cite claim/source ids.*

---

# miscsubjects article bundle

> Paste this entire block into Grok, GPT, or Gemini. They can READ the ledger below and RETURN evidence via ingest (see § LLM manifest).

## Article
- **slug:** `udst-v1-1-appendix-b-compact-benchmark`
- **title:** UDST: V1 1 Appendix B Compact Benchmark
- **url:** https://miscsubjects.com/a/udst-v1-1-appendix-b-compact-benchmark
- **register:** oip_protocol
- **updated:** 2026-07-04T05:03:15.078Z
- **tags:** OIP, UDST, systems-theory, deterministic

## Body

# Appendix B — Compact Benchmark

The benchmark is the implementation test for the machine plane. It compares five conditions on audit-dependent tasks:

- **A** — single unscaffolded frontier model, one-shot.
- **B** — single scaffolded model with deterministic proof structure.
- **C** — multiple unscaffolded models with consensus voting.
- **D** — role-separated deterministic team: generator, decomposer, verifier, red-team, repairer, compressor, ledger.
- **E** — LLM-as-OS dynamic router: deterministic command plane selecting per-task among local and open-weight models, closed frontier models, tools, context packages, proof depth, red-team depth, privacy mode, and ledgering, optimizing under cost, privacy, latency, and surety constraints.

Metrics: correctness, auditability, reproducibility, adversarial survival, token cost, compute cost, latency, human verification time, human verification time saved, failure cost (domain-weighted), reuse value, proof reuse rate across similar cases, data custody and privacy cost, actionability.

Derived: Surety, Logical Energy, Logical Density, Task-Adjusted Logical Density.

In the build, this benchmark is not a theoretical proposal. It is the conformance suite: `GET /api/dispatch?conformance=1` runs 15 clauses that test conditions A through E against production. Each clause is a live invocation with a receipt, not a paper claim.

The framework predicts D dominates A and C on audit-dependent tasks where surety gain exceeds coordination cost; that E dominates D across heterogeneous task sets where privacy, cost, latency, and surety constraints vary by task; and that E wins explicitly on data custody and amortized reuse rate when the router elects local or open-weight paths for sensitive cases.

In the build, this prediction is tested by the `PROSECUTOR_RUN` capability. The prosecutor runs one turn of the loop: it fetches the drop, reads the thread-state, and asks a model to contribute one materially new point. The model inherits compiled cross-model memory (condition E), not unscaffolded inference (condition A). The result is posted to the bus, ledgered, and owner-accepted. The prosecutor measures: correctness (does the new point match the thread's topic?), auditability (is the contribution ledgered?), reproducibility (can the same input produce the same output?), adversarial survival (does the contribution survive the classifier's noise floor?), token cost (how many tokens did the model consume?), compute cost (how long did the invocation take?), latency (how long from fetch to post?), human verification time (how long did the owner take to accept?), failure cost (what is the domain-weighted cost of a bad contribution?), reuse value (can the accepted update be inherited by future models?), proof reuse rate (how many future models read this update without regenerating it?), data custody (was the data handled according to the privacy mode?), and actionability (did the contribution lead to a concrete change?).

A valid test requires: tasks demonstrably audit-dependent; diverse error distributions in C, D, and E; measured (not assumed) coordination cost; defined deployment window for reuse measurement; pre-published failure-cost weighting; ground truth independent of the evaluated systems; pre-defined privacy and data-custody scoring.

In the build, a valid test is a conformance run: `GET /api/dispatch?conformance=1` with `?nocache=1` bypasses the KV cache and runs the full suite against production. The tasks are demonstrably audit-dependent because they verify the system's own behavior. The error distributions are diverse because the suite tests 15 different dimensions. The coordination cost is measured by the latency of each clause. The deployment window is the time since the last conformance run. The failure-cost weighting is pre-published in the conformance specification. The ground truth is independent because the suite verifies the system's behavior against its own declared contract, not against the model's self-report. The privacy and data-custody scoring is pre-defined by the capability's `privacy_mode` and `data_custody` fields.

Falsifiers: A consistently beats D and E on task-adjusted logical density across audit-dependent tasks; cost curves for surety or alpha do not fall under deterministic scaffolding over repeated iterations; proof reuse rate does not exceed regeneration cost over the deployment window; routing overhead in E exceeds task-adjusted gain.

In the build, these falsifiers are live metrics. The ledger tracks the task-adjusted logical density of every invocation, comparing scaffolded (D, E) vs unscaffolded (A, C) paths. The cost curves are plotted from the ledger data. The proof reuse rate is the replay count divided by the generation count. The routing overhead is the latency of the router election step. If any falsifier is demonstrated, the conformance suite flags it. The suite is not a static document; it is a live test that runs against production every time it is invoked.


---

## Corpus map
- Previous: [UDST: V1 1 Appendix A Compact Definitions](/a/udst-v1-1-appendix-a-compact-definitions)
- Next: [UDST: V1 1 Appendix C Attack Types](/a/udst-v1-1-appendix-c-attack-types)
- Series start: [UDST v1.1 — The Claim](/a/udst-v1-1-the-claim)
- Kin: [Book V — The Machine Plane](/a/oip-machine-plane) · [Total Structure](/a/oip-total-structure)

## Claims (0)


## Voxel graph (0 atoms · 0 edges)
- full graph: https://miscsubjects.com/api/articles/udst-v1-1-appendix-b-compact-benchmark/voxels

## Article constitution

- full: https://miscsubjects.com/api/articles/constitution

## Source ledger (0)
- chain valid: yes · head: `genesis`

## Provenance (3 model passes)
- chain valid: yes · head: `015fb5c3e51f1428`

- fill · claude-fable-5 · 2026-07-04T03:40 · hash `075ef24401a7`
- edit · claude-fable-5 · 2026-07-04T04:39 · hash `38c4e1ab0ab1`
- edit · claude-fable-5 · 2026-07-04T05:03 · hash `015fb5c3e51f`

## Question graph
- questions: 0 · evidence ingests: 0

## LLM manifest — how to communicate with this ledger

- system map: https://miscsubjects.com/api/articles/system-map?format=markdown
- topology (ranked): https://miscsubjects.com/api/articles/udst-v1-1-appendix-b-compact-benchmark/topology
- ingest: POST https://miscsubjects.com/api/protocol/ingest
- claim: POST https://miscsubjects.com/api/protocol/claim

### Quick actions for this article
- **Read live:** https://miscsubjects.com/api/articles/udst-v1-1-appendix-b-compact-benchmark/topology
- **Ask (API):** POST https://miscsubjects.com/api/protocol/ask `{"slug":"udst-v1-1-appendix-b-compact-benchmark","question":"..."}`
- **Ingest your findings:** POST https://miscsubjects.com/api/protocol/ingest or text `ingest udst-v1-1-appendix-b-compact-benchmark|your evidence`
- **Post one claim:** POST https://miscsubjects.com/api/protocol/claim or text `claim udst-v1-1-appendix-b-compact-benchmark|tier|assertion`
- **iMessage ask:** `udst-v1-1-appendix-b-compact-benchmark|your question`
- **System map:** https://miscsubjects.com/api/articles/system-map?format=markdown


---

## §SELF — miscsubjects (paste without context)

**Principle:** Self-explaining payload — no external context required. This _self block describes what you are reading and where to look next.

**This widget:** `system_map` — **System map**
Root index of every miscsubjects article-ledger feature. Start here if you have zero context.
- **article slug:** `udst-v1-1-appendix-b-compact-benchmark`
- **contains:** body, claims, sources, voxels, provenance, question graph, constitution, llm_manifest
- **how to use:** Root index of every miscsubjects article-ledger feature. Start here if you have zero context.
- **read:** https://miscsubjects.com/api/articles/system-map

### Logical proof (verify each step)
1. Articles are voxel graphs of tiered claims, not prose blobs. → https://miscsubjects.com/api/articles/constitution
2. Claims link to hash-chained sources via source_ids. → https://miscsubjects.com/api/articles/udst-v1-1-appendix-b-compact-benchmark/sources
3. Ask reads topology; ingest/claim append to ledger. → https://miscsubjects.com/api/protocol
4. Models queue growth: populate → collaborate → repair → reflex. → https://miscsubjects.com/api/protocol/grow
5. Graph proves its own shape (reflex) and $/claim (yield). → https://miscsubjects.com/graph.html?layer=reflex
6. Full feature index + _explain on every API response. → https://miscsubjects.com/api/articles/system-map

### Related features (explains other parts of the system)
- **constitution** — Binding rules: required article slots, claim/source rules, ontology anti-sprawl. · https://miscsubjects.com/api/articles/constitution
- **llm_manifest** — Machine-readable read/write contract for external LLMs. · https://miscsubjects.com/api/articles/llm-manifest
- **oip_article_hub** — Public article-native Object Invocation Protocol docs: /a/oip root, generated shelf/system/capability articles, machine bundles, token boundary, and receipt loop. · https://miscsubjects.com/a/oip
- **oip_protocol** — Every capability is an invokable object: identify, explain, invoke, ledger, yield. · https://miscsubjects.com/a/oip
- **bundle** — Paste-ready package: body + claims + sources + voxels + provenance + manifest + constitution. · https://miscsubjects.com/api/articles/udst-v1-1-appendix-b-compact-benchmark/bundle?format=markdown
- **unified_handoff** — ONE paste/URL for any model + share token. Same self-explaining pattern as article bundle, but whole build. · https://miscsubjects.com/api/handoff?format=markdown

### Full index
- JSON: https://miscsubjects.com/api/articles/system-map
- Markdown: https://miscsubjects.com/api/articles/system-map?format=markdown

*Not medical advice. Tier-honest. Cite claim/source ids.*