What Is Pagination?
What It Is
Pagination is the deterministic slicing of a large, ordered dataset into bounded, addressable windows. Every window has a boundary. Every boundary has a name. That name — a cursor, an offset, a page number — lets any system request exactly the same slice twice and get the same result.
Why It Matters
Infinite scroll feels good. It also destroys reproducibility.
When a dataset has no boundaries, you cannot reference it. You cannot audit it. You cannot prove what you saw. Pagination draws lines. Those lines are contract surfaces. They turn a streaming river into a numbered archive.
The philosophical weight is this: deterministic access is the foundation of trust. If two observers cannot request the same slice and agree on its contents, you have opinion, not protocol. Pagination makes sequence explicit. It makes consensus possible.
Practically, it protects systems. It caps memory. It bounds latency. It turns "load everything" into "load exactly this." That transformation is the difference between a toy and infrastructure.
How It Works
At its core, pagination is a contract between a caller and a dataset. The caller asks for a window. The dataset returns the window plus a pointer to the next one.
Step 1: The caller chooses a strategy.
- Offset/limit: Start at position N, return M items. Simple. Dangerous on changing datasets.
- Cursor-based: A opaque token marks the boundary. The dataset decodes it. Stable. Scalable.
- Keyset: The boundary is a value in the ordering column. Fast. Requires an ordered index.
Step 2: The caller sends a request.
GET /items?page[cursor]=abc123&page[size]=50Step 3: The dataset evaluates the boundary.
- Cursor "abc123" decodes to: last seen ID = 7,491, timestamp = 1698000000.
- Query becomes: WHERE id > 7491 AND created_at >= 1698000000 ORDER BY created_at, id LIMIT 50.
Step 4: The dataset returns the window plus the next pointer.
{
"data": [ /* 50 items */ ],
"links": {
"next": "/items?page[cursor]=def456"
}
}Step 5: The caller decides. Stop? Or follow the next pointer? The dataset does not decide. The caller does. That separation of concerns is clean.
The Contract
The interface:
- Input:
limit(max items per window, bounded by a global cap),cursor(opaque token, optional; omitted means "first window"). - Output:
data(array of items, length ≤ limit),next_cursor(opaque token, null if no further data).
The invariants:
- Determinism: The same cursor, requested twice, returns the same ordered sequence.
- Exclusivity: No item appears in two windows for the same cursor sequence.
- Completeness: Every item in the ordered dataset appears in exactly one window, or is newly added after the cursor was minted.
- Boundedness:
limitis always ≤ the global cap. The global cap is a hard ceiling. - Opacity: The caller does not decode the cursor. The dataset encodes and decodes it. The cursor is a capability, not a coordinate.
The failure modes:
- Cursor invalid: 400. The token is corrupt or expired.
- Limit exceeded: 400. The caller asked for more than the global cap.
- Dataset empty: 200, data = [], next_cursor = null. Not an error. A boundary.
Real Examples
GitHub API commits: GitHub returns 30 commits per page. The Link header contains rel="next" with a URL carrying page=2, page=3. The cursor is the page number. The ordering is implicit: reverse chronological. The contract is simple because the dataset is append-only at the top.
Stripe API charges: Stripe uses cursor-based pagination. The starting_after parameter is an object ID. The response includes has_more. The ordering is creation time, stable because IDs are KSUIDs. The contract is strong: no charge ever changes position.
PostgreSQL keyset with LIMIT/OFFSET: A query like SELECT * FROM events WHERE id > $cursor ORDER BY id LIMIT 50 is keyset pagination at the database layer. The cursor is the last id of the previous batch. The database uses the primary key index to seek directly. No full table scan. No memory bloat.
Twitter/X timeline (historical): The timeline API once exposed max_id and since_id. These were tweet IDs, which are Snowflake timestamps in disguise. The cursor encoded both position and time. Two boundaries, one token. Elegant.
OIP ledger events: The OIP ledger appends events in strict order. Each event has a monotonic sequence number. Pagination requests ?after_seq=8471&limit=100. The dataset returns events 8472-8571. The cursor is 8571. The next request is ?after_seq=8571&limit=100. Every auditor, every replica, every observer can request the exact same window. That is the point.
Common Mistakes
Using offset/limit on mutable datasets: Offset says "skip N rows." If a row is inserted at position 5, every offset shifts. The same request, run twice, returns different items. You have lost determinism. You have broken the contract.
Letting the caller set limit without a cap: A caller asks for limit=1000000. The system allocates a million rows in memory. The database locks. The node dies. The cap is not a suggestion. It is a guardrail.
Returning the cursor as a raw database ID: The caller starts guessing IDs. They iterate backward. They probe gaps. The cursor is a capability. It should be opaque, signed, or encoded. Treat it like a session token.
Omitting has_more or next_cursor when the dataset is empty: An empty page is not an error. It is a valid boundary. The absence of a next cursor is the signal. Returning 404 or 204 turns a clean contract into an edge-case nightmare.
Paginating without a total order: If the dataset has no deterministic sort, page 2 is fiction. The database returns "some 50 rows." Which 50? Undefined. Every page request is a dice roll.
Connection to OIP
OIP is built on three principles: open, deterministic, auditable. Pagination is the mechanical expression of all three.
Open: A paginated endpoint is a public contract. Any client can walk it. No hidden state. No "you had to be there." The dataset is inspectable one window at a time.
Deterministic: The cursor is a commitment. It binds a specific query to a specific result set. Two honest nodes, given the same cursor, agree. That is consensus material. That is what makes a protocol a protocol instead of a service.
Auditable: An auditor does not need the full dataset. They request page 1, then page 2, then page 3. Each page is small, verifiable, and self-contained. The auditor hashes the page. They compare hashes. If the dataset drifts, the hash changes. Pagination turns audit from a memory-intensive nightmare into a bounded, parallelizable walk.
In OIP, pagination is not a convenience feature. It is a structural requirement. Without it, the ledger is a stream. With it, the ledger is an archive. And archives are what civilizations build on.
Connection to the Grain Philosophy
This protocol is part of the Open Inventory Protocol — a living system of self-describing voxels that serves the Grain philosophy. The OIP is the interface. The philosophy is the core.