{"_self":{"principle":"Self-explaining payload — no external context required. This _self block describes what you are reading and where to look next.","widget":"article_topology","feature":"topology","name":"Article topology","what":"Claims, sources, anecdotes, user reports, related embeds, question graph slice — for ask/ROUTER.","contains":"claims, sources, anecdotes, question_graph slice","slug":"udst-v1-1-appendix-b-compact-benchmark","urls":{"read":"https://miscsubjects.com/api/articles/udst-v1-1-appendix-b-compact-benchmark/topology"},"how_to_use":"Claims, sources, anecdotes, user reports, related embeds, question graph slice — for ask/ROUTER.","write":null,"imessage":null,"router_tag":null,"proof_chain":[{"step":1,"claim":"Articles are voxel graphs of tiered claims, not prose blobs.","verify":"https://miscsubjects.com/api/articles/constitution"},{"step":2,"claim":"Claims link to hash-chained sources via source_ids.","verify":"https://miscsubjects.com/api/articles/udst-v1-1-appendix-b-compact-benchmark/sources"},{"step":3,"claim":"Ask reads topology; ingest/claim append to ledger.","verify":"https://miscsubjects.com/api/protocol"},{"step":4,"claim":"Models queue growth: populate → collaborate → repair → reflex.","verify":"https://miscsubjects.com/api/protocol/grow"},{"step":5,"claim":"Graph proves its own shape (reflex) and $/claim (yield).","verify":"https://miscsubjects.com/graph.html?layer=reflex"},{"step":6,"claim":"Full feature index + _explain on every API response.","verify":"https://miscsubjects.com/api/articles/system-map"}],"related_features":[{"id":"ask","name":"Ask protocol","what":"Answer only from topology; creates question_node with gaps and ingest_hint.","urls":{"read":"https://miscsubjects.com/api/articles/udst-v1-1-appendix-b-compact-benchmark/prompts","write":"https://miscsubjects.com/api/protocol/ask"}},{"id":"graph_topology","name":"Cross-article graph","what":"Merged claims/sources across condition+stack slugs for one question.","urls":{"read":"https://miscsubjects.com/api/articles/udst-v1-1-appendix-b-compact-benchmark/graph-topology?question=..."}},{"id":"question_graph","name":"Question graph","what":"Ask nodes (questions + gaps) and evidence_ingest nodes (pasted model output).","urls":{"read":"https://miscsubjects.com/api/articles/udst-v1-1-appendix-b-compact-benchmark/question-graph","write":"https://miscsubjects.com/api/protocol/ask"}},{"id":"voxels","name":"Voxel graph","what":"Claims as atoms, sources as edges (supported_by, posted_by). Per-claim provenance.","urls":{"read":"https://miscsubjects.com/api/articles/udst-v1-1-appendix-b-compact-benchmark/voxels","write":"https://miscsubjects.com/api/protocol/claim"}}],"system_map":"https://miscsubjects.com/api/articles/system-map","system_map_markdown":"https://miscsubjects.com/api/articles/system-map?format=markdown","not_medical_advice":true},"_explain":{"feature":"topology","name":"Article topology","what":"Claims, sources, anecdotes, user reports, related embeds, question graph slice — for ask/ROUTER.","why":"Every feature is auditable collective intelligence","how":"Claims, sources, anecdotes, user reports, related embeds, question graph slice — for ask/ROUTER.","model":null,"verifies":null,"urls":{"read":"https://miscsubjects.com/api/articles/udst-v1-1-appendix-b-compact-benchmark/topology"},"imessage":null,"router":null,"related":[{"id":"ask","what":"Answer only from topology; creates question_node with gaps and ingest_hint."},{"id":"graph_topology","what":"Merged claims/sources across condition+stack slugs for one question."},{"id":"question_graph","what":"Ask nodes (questions + gaps) and evidence_ingest nodes (pasted model output)."},{"id":"voxels","what":"Claims as atoms, sources as edges (supported_by, posted_by). Per-claim provenance."}],"not_medical_advice":true},"slug":"udst-v1-1-appendix-b-compact-benchmark","title":"UDST: V1 1 Appendix B Compact Benchmark","register":"oip_protocol","tags":["OIP","UDST","systems-theory","deterministic"],"updated_at":"2026-07-04T05:03:15.078Z","body_excerpt":"# Appendix B — Compact Benchmark\n\nThe benchmark is the implementation test for the machine plane. It compares five conditions on audit-dependent tasks:\n\n- **A** — single unscaffolded frontier model, one-shot.\n- **B** — single scaffolded model with deterministic proof structure.\n- **C** — multiple unscaffolded models with consensus voting.\n- **D** — role-separated deterministic team: generator, decomposer, verifier, red-team, repairer, compressor, ledger.\n- **E** — LLM-as-OS dynamic router: deterministic command plane selecting per-task among local and open-weight models, closed frontier models, tools, context packages, proof depth, red-team depth, privacy mode, and ledgering, optimizing under cost, privacy, latency, and surety constraints.\n\nMetrics: correctness, auditability, reproducibility, adversarial survival, token cost, compute cost, latency, human verification time, human verification time saved, failure cost (domain-weighted), reuse value, proof reuse rate across similar cases, data custody and privacy cost, actionability.\n\nDerived: Surety, Logical Energy, Logical Density, Task-Adjusted Logical Density.\n\nIn the build, this benchmark is not a theoretical proposal. It is the conformance suite: `GET /api/dispatch?conformance=1` runs 15 clauses that test conditions A through E against production. Each clause is a live invocation with a receipt, not a paper claim.\n\nThe framework predicts D dominates A and C on audit-dependent tasks where surety gain exceeds coordination cost; that E dominates D across heterogeneous task sets where privacy, cost, latency, and surety constraints vary by task; and that E wins explicitly on data custody and amortized reuse rate when the router elects local or open-weight paths for sensitive cases.\n\nIn the build, this prediction is tested by the `PROSECUTOR_RUN` capability. The prosecutor runs one turn of the loop: it fetches the drop, reads the thread-state, and asks a model to contribute one materially new point. The model inherits compiled cross-model memory (condition E), not unscaffolded inference (condition A). The result is posted to the bus, ledgered, and owner-accepted. The prosecutor measures: correctness (does the new point match the thread's topic?), auditability (is the contribution ledgered?), reproducibility (can the same input produce the same output?), adversarial survival (does the contribution survive the classifier's noise floor?), token cost (how many tokens did the model consume?), compute cost (how long did the invocation take?), latency (how long from fetch to post?), human verification time (how long did the owner take to accept?), failure cost (what is the domain-weighted cost of a bad contribution?), reuse value (can the accepted update be inherited by future models?), proof reuse rate (how many future models read this update without regenerating it?), data custody (was the data handled according to the privacy mode?), and actionability (did the contribution lead to a concrete change?).\n\nA valid test requires: tasks demonstrably audit-dependent; diverse error distributions in C, D, and E; measured (not assumed) coordination cost; defined deployment window for reuse measurement; pre-published failure-cost weighting; ground truth independent of the evaluated systems; pre-defined privacy and data-custody scoring.\n\nIn the build, a valid test is a conformance run: `GET /api/dispatch?conformance=1` with `?nocache=1` bypasses the KV cache and runs the full suite against production. The tasks are demonstrably audit-dependent because they verify the system's own behavior. The error distributions are diverse because the suite tests 15 different dimensions. The coordination cost is measured by the latency of each clause. The deployment window is the time since the last conformance run. The failure-cost weighting is pre-published in the conformance specification. The ground truth is independent because the suite verifies the system's behavior against its own declared contract, not","ranking":"safety-first (interaction_risk/limitations), then quote-gated effective_weight","claims":[],"sources":[],"anecdotal_sources":[],"scientific_sources":[],"user_reports":[],"related_articles":[],"question_graph":{"slug":"udst-v1-1-appendix-b-compact-benchmark","questions":[],"evidence":[],"edges":[],"counts":{"questions":0,"evidence":0,"edges":0}},"honesty":{"active_claims":0,"retracted_claims":0,"cut_claims":0,"challenges":0,"scrub_events":0,"note":"Retracted/cut claims stay on ledger but are excluded from ask unless ?include_inactive=1"},"counts":{"claims":0,"claims_total":0,"sources":0,"anecdotal":0,"scientific":0,"user_reports":0,"questions":0,"evidence_ingests":0}}