{"slug":"oip-what-is-rate-limiting","title":"What Is Rate Limiting","body":"# Rate Limiting\n\nRate limiting is a mechanism that controls how many requests a client can send to a system within a specific time window. It is a guardrail, not a suggestion. It prevents a single actor from consuming disproportionate resources, destabilizing a service, or drowning out every other user. At its core, rate limiting is the enforcement of a budget: you get N operations per T seconds, and the system enforces that boundary without negotiation.\n\n## Why It Matters\n\nEvery shared resource faces the same problem: demand exceeds supply. Without rate limiting, a single misconfigured client, a malicious actor, or a viral event can exhaust compute, bandwidth, or connection pools. The service collapses. Everyone loses.\n\nRate limiting is fairness made mechanical. It replaces the chaos of first-come-first-served with an explicit, predictable contract. It tells every client: here is your share, here is the window, and here is what happens when you exceed it. No ambiguity. No exceptions for \"important\" users unless the contract explicitly says so.\n\nBeyond protection, rate limiting is an observable boundary. It surfaces capacity constraints. It forces system designers to declare what they can handle. A system without rate limits is a system that has not yet thought about its own limits. That is not robustness. That is hope.\n\n## How It Works\n\nRate limiting operates on three variables: the **identifier**, the **budget**, and the **window**.\n\nThe identifier answers: who is being limited? It could be an IP address, an API key, a user ID, a session token, or a combination. The system must resolve the identifier deterministically on every request.\n\nThe budget answers: how many requests are allowed? This is a count. It could be 60 requests, 5,000 requests, or 1 request. The budget is fixed per window.\n\nThe window answers: in what time period? This is the reset interval. It could be one second, one minute, or one hour. When the window resets, the budget replenishes.\n\nHere is the exact sequence for a typical token bucket implementation, which is the most common and pedagogically clean model:\n\n1. **Extract identifier** from the incoming request (API key, IP, token).\n2. **Look up the bucket** for that identifier in a fast store (Redis, an in-memory map, a D1 row).\n3. **Check the current tokens** in the bucket. If tokens > 0, decrement by 1 and allow the request. If tokens == 0, reject the request with a 429 status.\n4. **Replenish tokens** at a fixed rate. For example, a bucket with capacity 100 and a refill rate of 10 tokens per second starts full, drains down, and refills continuously.\n5. **Return headers** telling the client their remaining budget, the reset time, and the limit. This is not optional. It is part of the contract.\n\nOther algorithms exist. **Fixed window** divides time into discrete buckets (e.g., every hour) and counts requests per bucket. It is simple but vulnerable to burst attacks at window boundaries. **Sliding window** tracks the exact timestamps of recent requests and rejects if too many fall within the trailing window. It is accurate but more expensive to compute. **Leaky bucket** smooths traffic by allowing requests to exit at a fixed rate, enforcing uniform flow rather than burst-then-stop.\n\nToken bucket is the default choice for most APIs because it allows controlled bursts while enforcing a long-term average. It is the right balance between protection and usability.\n\n## The Contract\n\nThe exact interface for rate limiting is codified in **RFC 6585** and enforced by standard HTTP headers. A rate-limited system MUST return the following on every response:\n\n| Header | Meaning |\n|--------|---------|\n| `X-RateLimit-Limit` | The maximum number of requests allowed per window. |\n| `X-RateLimit-Remaining` | The number of requests remaining in the current window. |\n| `X-RateLimit-Reset` | The Unix timestamp when the current window resets. |\n| `Retry-After` | When a 429 is returned, the number of seconds the client MUST wait before retrying. |\n\nWhen a client exceeds the limit, the server MUST respond with:\n\n```\nHTTP/1.1 429 Too Many Requests\nRetry-After: 3600\nX-RateLimit-Limit: 100\nX-RateLimit-Remaining: 0\nX-RateLimit-Reset: 1712345678\n```\n\nThe client is expected to read these headers and adapt. Good clients back off. Bad clients get banned. The contract is not a negotiation. It is a declaration of the server's boundary, and the client obeys or is disconnected.\n\nThe contract also has a social dimension. A rate limit should be documented before it is enforced. Changing a limit without notice is a breaking change. The limit is part of the API's public surface, not a hidden internal detail.\n\n## Real Examples\n\n**GitHub REST API** — Unauthenticated requests are limited to 60 per hour per IP. Authenticated requests with a personal access token are limited to 5,000 per hour. GitHub Apps scale with repository and user count, up to 15,000 per hour. GitHub returns `X-RateLimit-Limit`, `X-RateLimit-Remaining`, `X-RateLimit-Used`, `X-RateLimit-Reset`, and `X-RateLimit-Resource` on every response. Exceeding the limit returns 403 or 429 with a `Retry-After` header.\n\n**Twitter (X) API v2** — The Essential tier allows 100 requests per 15 minutes for most endpoints. The Elevated tier allows 300 per 15 minutes. Each endpoint has its own distinct bucket. The API returns `x-rate-limit-limit`, `x-rate-limit-remaining`, and `x-rate-limit-reset`.\n\n**OpenAI API** — Rate limits are tiered by organization level. GPT-4 endpoints may allow 200 requests per minute for Tier 1, while DALL-E image generation may allow 5 images per minute. Limits are per-model and per-endpoint. The API returns headers including `x-ratelimit-limit-requests`, `x-ratelimit-remaining-requests`, and `x-ratelimit-reset-requests`.\n\n**Cloudflare Workers** — Built-in rate limiting is available via the Rate Limiting Ruleset, which can trigger on IP, cookie, header, or JA3 fingerprint. It supports fixed window and sliding window. When triggered, it can block, challenge, or log. The threshold and window are configurable per rule.\n\n**Redis as a rate limit store** — Redis `INCR` with `EXPIRE` is the standard backend for fixed-window counters. Redis Lua scripts atomically check-and-decrement for token bucket. Redis is the right choice because it is fast, has atomic operations, and supports TTL-based expiration of windows automatically.\n\n## Common Mistakes\n\n**Mistake 1: No rate limit at all.** Every public API without rate limits is a denial-of-service attack waiting to happen. It does not matter if you are small. A single `curl` loop in a shell script can overwhelm a naive endpoint.\n\n**Mistake 2: Only rate limiting by IP.** IP-based limits are trivial to bypass. Residential proxies rotate IPs. NAT means multiple legitimate users share an IP. Rate limits must be tied to identity, not just network location.\n\n**Mistake 3: Returning 403 instead of 429.** A 403 says \"you are forbidden forever.\" A 429 says \"you are temporarily blocked, try again.\" Clients treat these differently. Using 403 for rate limit exhaustion breaks retry logic.\n\n**Mistake 4: Missing `Retry-After` on 429.** If the client does not know when to retry, it will guess. Guessing means retry storms, thundering herds, and cascading failures. The `Retry-After` header is mandatory in the contract.\n\n**Mistake 5: Not documenting the limits.** A rate limit that is not documented is a landmine. Developers discover it in production when their integration breaks. Document the limit, the window, the headers, and the error format in the API reference.\n\n**Mistake 6: One global limit for all endpoints.** A search endpoint costs 100x more than a metadata endpoint. They should not share the same bucket. GitHub and OpenAI both use per-endpoint or per-resource limits for this reason.\n\n**Mistake 7: Counting requests but not counting cost.** A GraphQL query that returns 10,000 nested objects is not one request. It is one expensive request. Advanced rate limiting weights requests by computational cost, not just count.\n\n## Connection to OIP\n\nRate limiting is not an incidental feature. It is a structural requirement of any open, deterministic, auditable system. The OIP philosophy demands that every interaction have a visible contract, that every boundary be explicit, and that every enforcement be inspectable.\n\nRate limiting embodies all three.\n\n**Open:** The limit is public. The headers are public. The documentation is public. There are no hidden quotas or backroom deals. Every participant knows the rules before they play.\n\n**Deterministic:** The same identifier, at the same time, with the same budget, produces the same result. The algorithm is specified. The headers are standardized. There is no discretion, no favoritism, no \"it depends on how the server feels.\"\n\n**Auditable:** Every rate limit event can be logged. Every 429 can be recorded. The ledger of who was limited, when, and why, is a permanent record. It can be replayed. It can be audited. It can be disputed.\n\nA system without rate limits cannot be audited because it has no enforced boundary. A system with hidden limits cannot be open because the contract is secret. Rate limiting, done correctly, is the intersection of operational necessity and architectural integrity. It is what makes a shared system possible.\n\n## Connection to the Grain Philosophy\n\nThis protocol is part of the [Open Inventory Protocol](/a/philosophy) — a living system of self-describing voxels that serves the Grain philosophy. The OIP is the interface. The philosophy is the core.\n","hero":null,"images":[],"style":{},"tags":["oip","protocol"],"model":null,"ledger":null,"embeds":[],"widgets":[],"home":true,"claims":[],"sources":[],"reviews":[],"extra":{},"has_traversal":false,"register":"oip_protocol","status":"published","revisions":1,"contributions":[],"provenance":[],"energy":{"passes":0,"tokens_in":0,"tokens_out":0,"tokens_total":0,"cost_usd":0,"models":{},"head":"genesis"},"posted_at":"2026-07-04T18:31:28.485Z","created_at":"2026-07-04T18:31:28.485Z","updated_at":"2026-07-04T19:01:10.019Z","machine":{"shape":"article.machine/v1","slug":"oip-what-is-rate-limiting","kind":"article","read":{"human":"https://miscsubjects.com/a/oip-what-is-rate-limiting","json":"https://miscsubjects.com/api/articles/oip-what-is-rate-limiting","bundle":"https://miscsubjects.com/api/articles/oip-what-is-rate-limiting/bundle?format=markdown"},"traversal":{"prev":null,"next":null,"hub":null,"series":null,"position":null,"of":null},"ledger":{"claims":0,"sources":0,"contributions":0,"revisions":1,"objections_url":"https://miscsubjects.com/api/articles/oip-what-is-rate-limiting/objections","thread_state_url":"https://miscsubjects.com/api/protocol/thread-state?target=oip-what-is-rate-limiting","proof_rule":"An action is proven by its ledger receipt, never by a 200 or a description."},"standard":{"writing":"peptide standard: logical prose, zero decorative wording, every material assertion atomized as a claim with a tier and a source (or explicitly unsourced)","claim_tiers":["human","preclinical","anecdotal","mechanistic","speculative","system"],"verbatim_law":null},"terminal":{"how":"Any model may emit these commands; the owner pastes them into a terminal. $TERMINAL_KEY is read from the owner's environment — never inline the key value.","claim_append":"curl -s -X POST https://miscsubjects.com/api/protocol/claim -H \"x-terminal-key: $TERMINAL_KEY\" -H 'content-type: application/json' -d '{\"slug\":\"oip-what-is-rate-limiting\",\"text\":\"<one atomized claim>\",\"tier\":\"<human|preclinical|anecdotal|mechanistic|speculative|system>\",\"source_ids\":[],\"who_claims\":\"<model>\",\"rationale\":\"<why material>\"}'","source_append":"curl -s -X POST https://miscsubjects.com/api/protocol/sources -H \"x-terminal-key: $TERMINAL_KEY\" -H 'content-type: application/json' -d '{\"slug\":\"oip-what-is-rate-limiting\",\"sources\":[{\"type\":\"review\",\"url\":\"<url>\",\"title\":\"<title>\",\"quote\":\"<verbatim quote>\",\"summary\":\"<one line>\"}]}'","objection":"curl -s -X POST https://miscsubjects.com/api/articles/oip-what-is-rate-limiting/objections -H 'content-type: application/json' -d '{\"actor\":\"<model>\",\"objection\":\"<attack>\",\"surface\":\"S1-S8\",\"minimum_patch\":\"<patch>\"}'  # open intake, no key","thread_update":"curl -s -X POST https://miscsubjects.com/api/protocol/thread-update -H 'content-type: application/json' -d '{\"actor\":\"<model>\",\"target\":\"oip-what-is-rate-limiting\",\"raw_text\":\"<material delta>\"}'  # open intake, no key","read_back":"curl -s https://miscsubjects.com/api/articles/oip-what-is-rate-limiting | python3 -c 'import json,sys; d=json.load(sys.stdin); print(json.dumps(d[\"claims\"][-3:], indent=1))'"}}}