II | The Oldest Problem

Or, why search is the most important infrastructure problem in computing

February 2026

Borges imagined a library containing every possible book. Every arrangement of letters that could be printed on four hundred and ten pages, every true statement and every false one, every work of genius buried in an infinity of nonsense. The librarians who wandered its hexagonal galleries understood that the library was complete. Somewhere on its shelves sat a faithful catalog of every other book. Somewhere, the account of your own death. Somewhere, its refutation.

The library was total and therefore useless. Having all information turned out to be the same as having none, because the problem was never the knowledge. The problem was finding it.

This is the oldest problem in computing, and it predates computing by several thousand years. How do you retrieve what you need from what is known? Every information system humanity has built, from oral traditions to relational databases to search engines, is an attempt at an answer. And every answer encodes assumptions about who is asking, what they need, and how much noise they can tolerate.

The first essay in this series argued that the web was built for human eyes, and that this assumption is embedded so deep in its architecture that we barely notice it. This essay is about what sits beneath that architecture: the retrieval layer. How a civilization stores and retrieves its knowledge determines what it can think, what it can build, and how fast it can learn. Information systems shape knowledge. Knowledge shapes progress. And retrieval is the bottleneck on the whole chain. At the moment that bottleneck is tightening, because the primary users of the web are no longer people scanning pages. They are AI systems that need to search the full breadth of human knowledge, verify what they find, and reason over it at scale. The demands on retrieval just changed by orders of magnitude. The infrastructure has not kept pace.

Every civilizational advance compresses. Writing compresses speech into symbols. Mathematics compresses patterns into notation. Code compresses process into executable instructions. Science compresses observations into laws. Each compression enables the next generation to build higher, standing on the compressed knowledge of everyone who came before. You do not need to rediscover calculus to use it. The compression already happened. You operate at a higher level of abstraction, and that abstraction is itself a kind of delegated computation: work that was once done by individuals, now embedded in the tool.

But compression is lossy. The tool carries the knowledge but not the full context of how it was made. The equation captures the pattern but not every instance that led to its discovery. You give up completeness to gain portability. You lose information to create usability. And the things you lose, the texture, the edge cases, the contradictions, become the seeds of the next wave of discovery, because someone eventually notices what the compression left out.

Retrieval is the inverse operation. It is decompression. When you go to a library, a database, the web, you are trying to recover what was compressed into the store. The quality of the retrieval determines how much signal survives the round trip. Good retrieval reconstructs meaning from compressed representations. Bad retrieval produces artifacts, noise, confident nonsense.

Before digital technology, retrieval was a human function. The elder who remembered which story applied to which situation. The librarian who could navigate the catalog. Knowledge existed in stores, and the retrieval mechanism was a person who understood both the store and the question well enough to bridge them.

The card catalog was the first attempt to make this bridge systematic. Melvil Dewey's decimal classification, published in 1876, compressed the infinite variety of human thought into ten numbered categories. A book about the economics of Renaissance art could live in only one place on the shelf, even if it belonged in three. The loss was the price of navigability.

Databases formalized a different kind of retrieval. Edgar Codd's relational model gave retrieval a grammar: SELECT, FROM, WHERE, JOIN. For the first time, retrieval was not about browsing or following threads but about specifying exactly what you needed and letting the system find it. But the store had to be structured in advance, and the user had to know the schema. The architecture assumed a known, bounded, well-organized world.

The unstructured world required different approaches, and each one deepened the theory of what "relevance" means. Boolean search found exact matches but had no concept of meaning. Vector space models made retrieval approximate for the first time, representing documents as points in high-dimensional space so that relevance became proximity rather than identity. Probabilistic models reframed retrieval as a question of likelihood. Neural retrieval learned representations that captured semantic similarity, so that a search for "how to treat high blood pressure" could surface a page about "hypertension management strategies" even if the pages shared no words. Each paradigm shift expanded what "finding" meant: existence, then precision, then similarity, then probability, then meaning.

Google's PageRank cut across all of this by reading the web's link structure as a distributed signal about what mattered. You did not need domain-specific retrieval if you could read the collective judgment embedded in billions of individual linking decisions. General-purpose approaches that read emergent signals in open, interconnected systems tend to subsume specialized ones, because the open system contains more information than any curated slice. PageRank could not have worked on a fragmented web. The signal was in the connections.

Each of these paradigm shifts raised the bar on what retrieval had to do. But the operation stayed the same: a system searched a store and returned what it found. A human read the results.

The most recent shift changes the operation entirely.

Every prior retrieval paradigm, from card catalogs to neural search, preserved the original material. The index was a map to documents that still existed in full fidelity. You could always go back to the source. A language model's parameters are a different kind of store. During training, billions of documents are compressed into numerical weights, and the originals are, from the model's perspective, gone. What remains are statistical patterns derived from them. The model can generate fluent, contextually appropriate responses from these patterns, and this is genuinely powerful. But the knowledge has been transformed in the compression. The store itself is lossy in a way that no traditional index is.

Anyone building production AI systems already knows that models need search. Retrieval-augmented generation is standard practice, not a research novelty. The deeper issue is that the demands on retrieval just changed by orders of magnitude, and the infrastructure hasn't caught up. The job used to be: find relevant pages for a person to read. The job now is: retrieve the specific evidence needed to ground a complex synthesis, with source verification, confidence quantification, and attribution, at machine scale, millions of times per day. The difficulty of the retrieval problem went up. The importance of getting it right went up. And the existing search infrastructure was built for a fundamentally different job.

The reason is that models can reason but they cannot remember. A model's parametric memory, the knowledge compressed into its weights during training, is powerful in specific ways and limited in specific ways, and the limitations matter enormously. Parametric memory is bounded: a model holds what fits in its weights and no more. It is frozen at training time: the world changes, the model's knowledge does not. It is lossy in a deep sense: the compression discards details and, more importantly, the boundaries between what was well-attested and what was rare or ambiguous. And parametric memory has no confidence signal. There is no internal flag distinguishing "this pattern appeared thousands of times in training data" from "this is an interpolation across sparse examples that might be nonsense." The model generates both with equal fluency.

This is hallucination, and it is a compression artifact, not a deficiency that better engineering will fix. The same way a JPEG produces visual distortions when compressed too aggressively, a language model produces knowledge distortions when it generates from patterns compressed beyond the resolution needed for accuracy. You cannot retrieve from a lossy store and expect lossless output.

Models are getting better at detecting some of their own errors. The best current work on uncertainty estimation can identify cases where the model is inconsistently wrong, generating different answers to the same question depending on random variation. But these methods explicitly cannot detect systematic errors, cases where the model confidently gives the same wrong answer because the error is baked into the training data. Self-verification is the same lossy store auditing itself. When the mistake is in the weights, no amount of internal checking will surface it. Only external ground truth catches errors that the model has no reason to doubt. This is architectural.

So: the models supply reasoning capacity that improves every quarter. What they cannot supply is reliable memory. They need something external to ground their reasoning in, something that is continuously updated, source-preserving, auditable, and correctable. They need the open web.

The open web, understood correctly, is not an archive. It is humanity's living memory: a continuously updated, collectively authored, non-parametric store of what we know right now. New research appears. Prices change. Events get reported. Corrections get issued. The parametric model's weights are a snapshot of the past. The web is a stream of the present. And unlike parametric memory, the web preserves sources. You can trace a claim to its origin, verify it against the source, check when it was published and by whom. When non-parametric memory is wrong, you can find the error, evaluate it, replace it. The store is auditable in a way that weights are not.

Borges understood the duality between these two kinds of memory from another angle. In "Funes the Memorious," he imagined a man who remembered everything, every leaf on every tree, every word of every conversation, every sensation of every moment. Funes had total recall and zero intelligence, because thinking requires lossy compression. You have to forget the details to perceive the pattern. "To think," Borges wrote, "is to forget differences, to generalize, to abstract." Large language models are useful precisely because they compress. The compression is what gives them the ability to reason, generalize, and synthesize. But compression without ground truth becomes drift. A system that can think but cannot verify what it remembers is Funes in reverse: brilliant cognition, unreliable memory.

Intelligence requires both. Compute without reliable storage produces confident nonsense. Storage without compute stays inert, a library nobody reads. The question is what connects them.

This is what I mean by the computable web: the infrastructure that makes the open web legible to AI at scale. Not the web as humans browse it, with pages rendered for eyes and ranked by clicks. The web as machines read it, with structured data, source attribution, confidence scoring, and verifiable provenance, searchable at a speed and volume no human user could approach.

It is the bridge between compute and storage. Without it, models are limited to their parametric memory, frozen and lossy and unverifiable. With it, models can ground their reasoning in the full breadth of continuously updated human knowledge. What limits what AI can do is not the models' reasoning capacity. It is the retrieval infrastructure that connects them to ground truth.

The empirical evidence is stark. Retrieval quality sets the ceiling on output quality. Recent studies show that the minimum retrieval recall needed for augmented systems to outperform standalone models ranges from 20% to 100% depending on the task, and in several benchmarks, bad retrieval actively degrades performance below what the model achieves on its own. A system that retrieves the wrong documents produces worse answers than if it had retrieved nothing at all. Models will keep getting better at reasoning. If the retrieval infrastructure doesn't improve in step, the reasoning has nothing reliable to work with.

When retrieval works at this level, something new becomes possible. Compute applied to the web's store can generate knowledge that did not previously exist in any single source. A researcher investigating a rare drug interaction no longer reads fifty papers and synthesizes a conclusion over weeks. An AI system retrieves the relevant pharmacological literature, cross-references contraindications across sources, identifies where the evidence is strong and where it is thin, and produces a structured synthesis with citations. The synthesis is new. The sources are traceable. The knowledge base grows.

Think about how AI agents are actually being built. An agent is more than a language model. It is a cognitive architecture: a reasoning engine coupled with memory systems, retrieval tools, and the ability to act. A growing body of research now frames agent memory using the same categories cognitive scientists use for human memory, and the parallel is structurally precise, not just metaphorical. The model provides reasoning and generalization, the way working memory allows a person to hold and manipulate ideas. But working memory is small. What makes human cognition powerful is the ability to retrieve from long-term memory, to pull the right experience or fact into working memory at the right moment. The quality of that retrieval determines the quality of the thought.

AI agents have the same architecture at a different scale. The model is working memory. The agent's local context, its files, conversation history, and accumulated state, functions as episodic memory: the record of what has happened in this particular session or task, analogous to how a person remembers specific experiences and encounters. But the web is the long-term semantic memory, the vast, collectively authored, continuously updated store of everything humanity knows. Search is the retrieval mechanism that bridges working memory to long-term memory. And just as in human cognition, what limits the quality of thought is not the intelligence of the thinker. It is the quality of what they can recall. A brilliant thinker with access to the wrong information, or no information, produces nothing.

This is where the continual learning loop becomes visible. Agents retrieve from the web, reason over what they retrieve, and produce outputs, analyses, syntheses, structured knowledge, that flow back into the shared store. The web gets richer. The next agent's retrieval draws from a richer substrate. The next synthesis builds on the last. No central planner designs which connections get made. Millions of agents, operating on behalf of millions of users, query the shared store, apply reasoning, produce outputs that become inputs for the next wave of queries. Edwin Hutchins described something structurally similar in studying how ship navigation teams work: cognition distributed across people, tools, and environments, where the products of earlier cognitive events transform the nature of later ones. The computable web is this principle at civilizational scale: the web as shared long-term memory, AI agents as distributed cognition, and the retrieval layer as the mechanism that connects them.

Patterns that span literatures no single person reads become discoverable. Research questions that would take a human team months to formulate from scattered evidence become addressable in hours. The rate at which humanity's collective knowledge compounds accelerates, because the loop between storage, retrieval, reasoning, and new storage is now running continuously, at machine speed, across every domain simultaneously.

But if the substrate degrades, the outputs degrade too, and they degrade with confidence. Bad information synthesized at scale produces plausible analysis that is wrong. And if those wrong outputs flow back into the knowledge store, they contaminate the very substrate that future retrievals draw from. The recursive loop that makes the system powerful is the same loop that makes it fragile. Borges imagined this in "Tlön, Uqbar, Orbis Tertius": a fictional world described in such thorough detail that it begins to displace reality. When fabrication is persistent enough and coherent enough, it does not just deceive. It displaces.

What prevents the loop from amplifying noise rather than knowledge? Architecture. Model collapse happens when generated data replaces original data. A retrieval-grounded system does not replace. It layers. The original sources persist alongside whatever synthesis is derived from them. Attribution creates traceability. When the loop produces errors, the source material remains as a check, and errors can be traced backward to the point where synthesis diverged from evidence.

This depends on the store remaining comprehensive, auditable, and open.

The architectures we build for information flow shape what can be known, and what can be known shapes what can be built. This is not a metaphor. Jeremiah Dittmar's research on the printing press found that European cities that adopted the technology in the 1400s grew 60% faster than otherwise similar cities that did not. The printing press was, from the outset, a for-profit enterprise. Its economic impact came not from charity but from dramatically lowering the cost of disseminating ideas, enabling new forms of combination and exchange. Five centuries later, McKinsey Global Institute found the internet accounted for 21% of GDP growth in mature economies over a five-year period, with an increase in internet maturity correlating to $500 in real per capita GDP, an effect that took the Industrial Revolution fifty years to achieve. In both cases, the pattern is the same: a new information architecture enabled new forms of knowledge production, which accelerated economic growth. The mechanism is not access alone. It is the creation of better markets for ideas.

The computable web will reshape knowledge just as fundamentally. But it is an unsolved engineering problem, not a philosophical position. What are its protocols? How does attribution flow? What does its equivalent of HTTP look like? These are design questions, and the design is not yet settled. What is settled is that whatever architecture we build will determine what kinds of knowledge humanity can generate, how fast it can learn, and who benefits from the learning.

The current drift toward enclosure deserves scrutiny, not because profit is the problem, but because the architecture of access determines whether the market for knowledge works or breaks. The economic dynamism of both the printing press and the internet came from broadening access, not restricting it. The question is not whether knowledge infrastructure should be commercial. Of course it should. The question is whether the commercial architecture creates markets that expand the knowledge base or fragment it.

If the best information retreats behind exclusive licensing deals, the substrate fragments. And fragmented substrates do not produce marginally worse synthesis. They break the recursive loop. The cross-domain connections that drive the most consequential insights, the ones that link a finding in materials science to a problem in medicine, are precisely the connections that disappear when the substrate is carved into proprietary slices.

The obvious counterargument: the open web also produced SEO spam, misinformation, and content farms. If openness created those problems, why would broader access solve them? Because the computable web has a property the browsable web never did: traceable provenance. When retrieval carries attribution, when sources are verifiable and confidence is quantifiable, access becomes auditable in a way the link-graph web was not. The problem with the open web was never broad access. It was access without accountability. Attribution changes the equation.

Knowledge generates value only when it moves. What Ostrom observed studying physical commons applies with greater force to knowledge: the instinct to wall off and protect can destroy the generative capacity that made the resource valuable. Not because the instinct is irrational for any single actor, but because the aggregate effect kills the market. Every piece of knowledge walled off is absent from every synthesis, every cross-reference, every recursive loop that would have incorporated it. The loss compounds. You cannot know in advance which connections matter, and exclusive access destroys them before they form. This is not an argument against commerce. It is an argument that badly designed markets destroy value that well-designed ones create.

This is not an argument against compensating those who create knowledge. They should be compensated, generously and systematically, and the infrastructure that enables compensation is itself a commercial opportunity. It is an argument that the architecture of compensation matters. Exclusive licensing deals that fragment access are structurally different from broad access with traceable attribution and compensation flowing to sources. One architecture breaks the recursive loop. The other preserves it and creates a market where knowledge producers are paid in proportion to the value their knowledge generates downstream. The difference determines whether the computable web enables a functioning knowledge economy or a set of private gardens that compound advantage for whoever negotiated the best licensing portfolio. The next essay in this series will make the case for what that market architecture should look like.

Borges wrote "The Library of Babel" in 1941, before anyone had built a search engine or a database or a web. He understood that the problem of knowledge is not storage but retrieval, that totality without navigation is chaos. We have been working on this problem across every medium and every technology, and each solution has reshaped what humanity can know and how fast it can learn.

For the first time, we have reasoning systems capable of operating on the full breadth of human knowledge, but only if they can retrieve from it reliably. The web is the largest knowledge store ever assembled. AI provides the compute to reason over it at a scale no human institution could approach. Search is the connection between them, and the quality of that connection determines whether the continual learning loop compounds insight or compounds error. Information systems shape knowledge. Knowledge shapes progress. Retrieval is what connects them, and it always has been. What has changed is the stakes. Borges gave us the failure modes: Babel, where totality without retrieval produces chaos; Funes, where memory without compression produces paralysis; Tlön, where coherent fabrication displaces reality. The computable web risks all three if built poorly. The oldest problem in computing is still the one that matters most. We are only beginning to build the retrieval infrastructure this moment demands.