The Compression Paradox
A few years ago, I sat on the stone wall of an old monastery near Tetecala, Mexico. I had spent weeks disconnected from the frenetic pace of modern life—no digital devices, no internet—in an effort to restore my mind and body to a more natural state. As I gazed upon the lush green hills rising above the sugarcane fields, I felt the inadequacy of language. The word "mountain" seemed insufficient to convey the interplay of hues and shadows on the hillside, the warmth of the air, the intricate patterns of leaves, the shimmering waves of light over the valley, a feeling of ineffable awe.
In that moment, I realized how language falls short in capturing the fullness of such experiences. To understand and communicate our perceptions, we must compress them—reducing their vast complexity into simpler, more manageable representations like words, symbols, or models. This process allows us to share ideas and build knowledge, but in compressing our experiences, we inevitably lose some of their richness and nuance, creating gaps in our understanding. Yet I came to see the essential power of this distillation; without it, we'd be limited in our ability to communicate and build knowledge.
Paradoxically, it’s within these very gaps—the losses incurred through compression—that new intuitions arise. Recognizing what our compressed representations fail to capture sparks fresh insights. Intuition is our immediate, holistic grasp of reality, rich in nuance and often resistant to precise articulation. Reason refers to the structured and deliberate manipulation of symbols and concepts to derive insight, involving encoding our intuitions into communicable forms through language, formulas, or other artifacts. Language, as a symbolic system used for communication, serves as both an encoding and decoding mechanism, translating thoughts into words and interpreting words back into thoughts.
This cyclical interplay between compression through reasoning and the emergence of new intuitions is what I call the compression paradox. By simplifying our rich, complex intuitions into concise words and symbols, we inevitably create gaps in understanding. Yet it’s within these very gaps that new insights emerge, propelling a continuous cycle of compression and discovery that drives human knowledge and progress. This paradox underlies all forms of understanding—from language and science to artificial intelligence—forming a fundamental engine of human advancement.
From this perspective, human progress isn’t just about accumulating knowledge—it’s about developing increasingly sophisticated ways to compress and communicate our understanding of the world, even as we recognize that something is always lost in the process.1 By examining how we navigate this paradox across different domains, we can better understand both the power and limitations of human cognition, and perhaps glimpse where it might lead us next.
The Cognitive Foundations of Compression
To understand how compression shapes human cognition, we need to examine how our minds process and make sense of the world. At its core, the brain functions as a sophisticated compression engine, distilling the immense flow of sensory data into manageable and meaningful patterns.
Perception itself begins with this act of compression. Our sensory systems do not capture every detail in our environment; instead, they filter and prioritize information based on relevance and significance. When we look at a face, we don't register every pore or strand of hair. Our visual system extracts key features—the arrangement of eyes, the curve of a smile—that allow us to recognize individuals and interpret their emotions.
This selective processing extends to higher cognitive functions. As we accumulate experiences, we form mental models—compressed representations of reality—that help us navigate new situations without analyzing every detail from scratch. These models enable us to predict outcomes, make decisions, and learn efficiently.
Here, the interplay between intuition and reason becomes crucial. Intuition represents our immediate, holistic understanding of a situation—a rich, often subconscious grasp that arises without deliberate analysis. It's the gut feeling that guides us before we can articulate why, drawing upon the vast reservoir of our compressed experiences.
Reason involves taking these intuitive insights and encoding them into explicit, communicable forms. It's the process of translating a nebulous sense into structured thoughts, using language and symbols to articulate ideas. This act of compression allows us to examine our intuitions critically, share them with others, and build upon them collaboratively.
Yet in compressing intuition into reason, some nuance is inevitably lost. The fullness of the original experience cannot be entirely captured in words or formulas. Paradoxically, it is within these losses—the gaps created by compression—that new intuitions emerge. These gaps highlight what is missing or unexplained, sparking curiosity and driving further inquiry.
Consider how this dynamic plays out in scientific discovery. A researcher encounters a phenomenon that doesn't fit existing theories—a gap in understanding. This discrepancy sparks an intuition about a possible explanation. Through reasoning, the scientist formulates a hypothesis, compressing the intuition into a testable statement. Experimentation then provides new data, which may confirm or challenge the hypothesis, leading to further insights and refinements. The process is iterative, a continuous interplay between intuition and reason, compression and expansion.
This cycle illustrates the essence of the compression paradox: by compressing our experiences into structured forms, we enable understanding and progress, yet we also create gaps that fuel new intuitions. It's a self-sustaining engine that drives human cognition forward, constantly pushing the boundaries of what we know.
Language: The Primordial Compression Tool
If our brains are natural engines of compression, then language is the most powerful tool we’ve crafted to harness this capacity. Language allows us to encode the vast, multidimensional landscape of our thoughts and experiences into linear sequences of sounds or symbols. By compressing complex ideas and emotions into words, we make them communicable, enabling shared understanding and collaboration.
This process of encoding—transforming rich, intuitive experiences into language—inevitably involves loss. The intricate nuances of our thoughts and feelings often resist full capture in words. When we speak or write, we compress our internal world into a form that others can decode, interpreting the symbols based on their own experiences and perspectives. This decoding process adds another layer of variability, as each listener or reader reconstructs the message within their own cognitive framework.
Language, therefore, is both a bridge and a filter. It enables efficient communication by distilling complexity into manageable units, but it also introduces gaps where meaning can be lost or transformed. These gaps are not merely limitations; they are fertile ground for new interpretations and innovations. Misunderstandings can lead to novel ideas, and the ambiguity inherent in language can spark creativity.
Consider poetry and literature. Writers often play with the limitations of language, using metaphor, symbolism, and ambiguity to evoke emotions and intuitions that ordinary language cannot fully encode. A poem doesn’t just convey information; it invites the reader into an experience, relying on the interplay between what is said and what is left unsaid. This artistic use of language embraces the losses of compression, transforming them into spaces for personal interpretation and new insights.
Moreover, language serves purposes beyond the transmission of information. It builds social bonds, expresses emotions, and participates in cultural rituals. Wittgenstein’s concept of language games highlights how the meaning of words arises from their use within specific contexts. Language is not a static code but a dynamic system shaped by the interactions of its users.
This dynamic is mirrored in the realm of artificial intelligence, particularly in large language models like GPT-4. These models engage in processes of encoding and decoding akin to human language use. They encode vast amounts of textual data into compressed internal representations, known as embeddings, capturing patterns and relationships between words and concepts. When generating text, they decode these representations to produce coherent and contextually appropriate responses.
AI language models rely on compression to function effectively. They distill immense datasets into manageable forms, enabling them to predict and generate language. This compression involves loss—nuances and context can be missed—but it also allows for efficiency and scalability. The scaling laws observed in AI research show that as models increase in size and complexity, their performance improves, reflecting deeper compression and better predictive capabilities.
The parallels between human language processing and AI models highlight the fundamental role of compression in cognition. Both systems depend on encoding and decoding to interpret and generate language, navigating the trade-offs between efficiency and expressiveness. The losses inherent in compression can lead to errors or misunderstandings, but they also open avenues for new intuitions and advancements.
In our daily communication, we constantly negotiate these trade-offs. We choose our words carefully, aiming to convey our thoughts as precisely as possible while recognizing that some loss is inevitable. This awareness drives us to develop more nuanced language, invent new terms, and refine our expressions—continuously evolving our means of encoding and decoding.
Language, then, is not merely a tool for transmitting pre-formed ideas but a participatory process that shapes thought itself. The act of encoding our intuitions into language forces us to clarify and structure them, while the act of decoding allows others to reconstruct and reinterpret them, often in ways we did not anticipate. This cyclical interplay enriches our collective understanding and propels the evolution of culture and knowledge.
The compression paradox manifests vividly in language: by compressing our rich, intuitive experiences into words, we enable communication and progress, yet we also create gaps that inspire new thoughts and innovations. Embracing this paradox allows us to appreciate the multifaceted role of language—as a means of connection, a catalyst for creativity, and a driving force in the ongoing dance between intuition and reason.
The Scientific Method: Compression in Action
If language is our primordial tool for compression, then science represents the systematic refinement of this process—compression elevated to an art form. At its core, the scientific method embodies the cyclical interplay between intuition and reason, encoding our rich intuitions into structured theories that others can decode, while the inherent losses in this compression drive the emergence of new intuitions.
Scientific inquiry often begins with an intuition—a sudden insight or a nagging sense that something in the existing framework doesn’t quite fit. This intuition is steeped in complexity and nuance, drawing from a deep well of personal experience and observation. To share and test this intuition, a scientist must compress it into a hypothesis, encoding it into the precise language of mathematics or experimental design. This act of compression involves deliberate simplification, focusing on essential elements while discarding extraneous details.
Consider Isaac Newton’s formulation of classical mechanics. Newton intuited that the motions of celestial bodies and objects on Earth could be described by the same principles. He compressed this profound intuition into three laws of motion and the law of universal gravitation—simple equations that could be communicated, tested, and built upon. These laws served as a highly efficient encoding of physical phenomena, enabling others to decode and apply them across a vast range of contexts.
Yet, this compression was inherently lossy. Newtonian mechanics could not account for anomalies observed at very large scales or at velocities approaching the speed of light. The gaps left by these losses became the seeds of new intuitions. Centuries later, Albert Einstein grappled with these inconsistencies. Through thought experiments—intuitive explorations of riding alongside a beam of light—he developed the theories of special and general relativity. Einstein compressed his revolutionary intuitions into the elegant equations E=mc^2
and the field equations of general relativity, providing a more comprehensive encoding of gravitational phenomena.
This progression illustrates the generative cycle of the compression paradox. Each new theory encodes intuitions into reasoned frameworks, enabling others to decode and apply them. The losses inherent in this compression—manifested as unexplained phenomena or anomalies—create gaps that inspire fresh intuitions. These intuitions, in turn, drive the development of new theories, perpetuating the cycle.
Thomas Kuhn described these transformative periods as paradigm shifts. When the accumulation of anomalies challenges the prevailing framework, the scientific community undergoes a fundamental change in perspective. This shift is not merely an incremental improvement but a reconfiguration of the underlying encoding and decoding processes. The old language and symbols are replaced or reinterpreted to accommodate the new paradigm, reflecting a deeper compression that captures more of the complexity of reality.
The balance between simplicity and accuracy in scientific theories underscores the selective nature of compression. Scientists must decide what details to omit, aiming for models that are both manageable and sufficiently representative of the phenomena. This selective compression is a deliberate trade-off: too much detail renders a model unwieldy; too little renders it inaccurate. The art of scientific modeling lies in finding the optimal balance, knowing that some loss is inevitable and that these losses may point the way to future discoveries.
In modern physics, the pursuit of a unified theory exemplifies this quest for an optimal compression. Quantum mechanics and general relativity are both highly successful in their domains—quantum mechanics at the subatomic level, general relativity at the cosmic scale—but they are mathematically incompatible. The gap between them represents a significant loss in our collective compression of physical laws. Efforts to bridge this gap, such as string theory or loop quantum gravity, are driven by the intuition that a deeper, more comprehensive encoding is possible.
This pattern is not confined to physics. In biology, the discovery of DNA's double helix structure compressed the complex mechanisms of heredity into a clear, communicable form. In economics, simplified models attempt to capture market behaviors, though they often fail to account for irrational or emergent phenomena given the inherent complexity of economic systems. In each case, the act of compression enables progress while the losses highlight areas ripe for new intuitions and evolved understandings.
The scientific method itself can be seen as a cycle of encoding and decoding. Scientists observe phenomena and develop intuitions about underlying patterns or principles. These intuitions are then compressed into hypotheses—clear, testable statements that others can understand and evaluate. Experiments are designed to test these hypotheses, generating data that others can decode to assess the validity of the original intuition. The results lead to refinements of the hypotheses, further compression, or the development of new intuitions in light of discrepancies. This iterative process underscores the dynamic interplay between intuition and reason, with each cycle enhancing our collective understanding while revealing new gaps and questions.
The compression paradox thus lies at the heart of scientific progress. By compressing complex intuitions into reasoned theories, we make sense of the world and communicate our findings. Yet, the very act of compression introduces losses that become catalysts for further inquiry. It is a self-perpetuating cycle—a dance between what we can explain and what remains elusive—driving us toward ever deeper levels of understanding.
The Evolution of Compression in Computing and AI
The story of compression doesn’t end with human cognition and scientific theories. In fact, it continues to unfold dramatically in the realms of computing and artificial intelligence. The history of computing can be seen as a journey toward ever more sophisticated forms of compression, mirroring the evolution of human language and thought.
At its core, a computer is a machine that processes information by executing a series of instructions known as a program. These instructions manipulate data, and at the most fundamental level, all information in a computer is reduced to binary code—strings of ones and zeros representing instructions and data. This is compression in its most basic form, distilling complex information into a simple binary distinction.
In the earliest days of computing, this process was literal and labor-intensive. Consider the human "computers" of the early 20th century—people, often women, whose job was to perform complex mathematical calculations by hand. They were essential in fields like astronomy and ballistics, where vast amounts of data needed processing. The ENIAC (Electronic Numerical Integrator and Computer), one of the first general-purpose electronic computers, was designed to automate this work. Yet even ENIAC required extensive human intervention to program and operate, with instructions input through physical switches and cables.
As computing evolved, layers of abstraction were added, each compressing and simplifying the underlying complexity. The development of assembly language was a crucial step, providing a slightly more human-readable form of machine instructions. But the real leap came with the creation of high-level programming languages like Fortran, C, and later Python and Java. These languages allow programmers to express complex algorithms and data structures in a more abstract, compressed form. A few lines of Python code can represent operations that would require hundreds of lines of machine code.
This evolution toward higher levels of abstraction in programming mirrors the development of increasingly sophisticated compressions in human language and thought. Just as natural language enables us to communicate complex ideas efficiently, high-level programming languages allow us to express complex computations effectively. And just as our natural languages continue to evolve, so too do programming languages, with each new generation offering more powerful tools for compressing and manipulating information.
Data compression is another fundamental aspect of modern computing. From the earliest days of digital communication, the need to efficiently store and transmit large amounts of data has driven the development of compression algorithms and standards. Lossless compression techniques, such as Huffman coding and Lempel-Ziv compression, allow data to be compressed and decompressed without any loss of information. Lossy compression methods, like JPEG for images and MP3 for audio, achieve higher compression ratios by discarding less perceptually important details. The trade-off between compression ratio and fidelity mirrors the balance between efficiency and expressiveness in communication and cognition. In many applications, the goal is to find the optimal balance between file size and quality, compressing data as much as possible while maintaining acceptable perceptual fidelity.
At the heart of computing lies the concept of abstraction—the process of reducing complex systems to simpler, more manageable representations. Abstraction is essentially a form of compression, capturing the essential features of a system while discarding irrelevant details. The hierarchy of abstraction in computing, from physical hardware to high-level software, reflects a series of compression steps, each building on the layers below to enable more powerful and efficient computation.
But perhaps the most dramatic leap in computational compression has come with the advent of artificial intelligence, particularly in the form of large language models like GPT-4. These models represent a significant advancement in our ability to compress and manipulate linguistic information.
The connection between intelligence and compression is not a new idea. It's formalized in concepts like the Hutter Prize, established by Marcus Hutter in 2006. The prize rewards the creation of programs that can most effectively compress a large corpus of human knowledge—in this case, a snapshot of Wikipedia. The premise is profound: the ability to compress information efficiently is fundamentally linked to understanding and intelligence. To compress Wikipedia effectively, a program must capture the essential meaning and structure of the text, discarding irrelevant details while preserving key information. In essence, compression requires a deep understanding of the content.
This idea is further supported by the scaling hypothesis in AI, suggesting that many aspects of intelligence emerge simply from training larger models on more data. As models scale up in size and are trained on larger datasets, they demonstrate increasingly sophisticated capabilities, often in unexpected areas. This scaling effect manifests as increasingly powerful compression: larger models capture and represent more nuanced patterns in the data.
At their core, language models like GPT-4 are massive compression engines. They take vast amounts of textual data and distill it into dense, informative representations known as embeddings. These embeddings reduce the high-dimensional, unstructured data of language into a lower-dimensional, structured form. Each word, phrase, or concept is mapped to a point in this embedding space, with semantically similar entities clustered together.
During training, the model learns these embeddings through a process of self-supervised learning. It tries to predict words based on their context, iteratively adjusting its internal representations to minimize prediction error. This process is guided by a loss function, measuring how well the model's predictions match the actual data. Minimizing this loss equates to maximizing the model's ability to compress the training data effectively. In this sense, the quality of a model's compression is a measure of its intelligence.
What's remarkable is not just the ability of these models to compress information, but their capacity to use this compressed knowledge flexibly and creatively. Given a prompt, a model like GPT-4 can generate coherent, contextually appropriate text on a wide range of topics. It can perform tasks like translation, summarization, and even basic reasoning, all drawing on its compressed representation of language. This demonstrates a fundamental principle: the better you can compress, the better you can predict; and being able to predict well is key to acting intelligently.
Attention mechanisms during training and inference allow these models to focus on relevant concepts and patterns across the vast corpus of data they’ve been trained on. This mimics the way human brains—likely lossy compressors themselves—attend to salient information in our environment. Research suggests that more efficient use of the "context window" improves the reasoning capabilities of models; more focused attention on localized regions can enhance reasoning.
As the model compresses training data into more efficient, informative representations, it effectively learns the structure and meaning of language. It recognizes patterns, analogies, and relationships, generalizing from specific instances to broad concepts. In this way, AI models push the boundaries of what’s possible in terms of compression and knowledge representation, offering new insights into the nature of intelligence.
Moreover, these models become increasingly powerful reasoning engines when combined with other tools in a cognitive system architecture. By integrating programs that can write and execute code, retrieve information, build conceptual graphs, or employ sophisticated search algorithms, AI systems begin to mimic the way humans use tools as extensions of their minds. This resonates with Andy Clark's extended mind theory, positing that the human mind extends beyond the brain to encompass external tools and resources. When grappling with an intuition or idea, a person might reach for a notepad or computer to record, manipulate, and refine their thoughts.
In a sense, these AI models replicate, at an accelerated pace, the intuition-reason cycle that drives human cognitive progress. The training process, where the model learns to predict patterns in vast amounts of text, parallels the accumulation of intuitive knowledge in humans. The model's ability to generate text or perform specific tasks is akin to reasoning, applying this compressed knowledge to particular problems or contexts.
This is not to say that current AI models are without limitations. Like all forms of compression, they involve loss. The statistical patterns they learn may reflect biases present in their training data, and they can generate plausible-sounding but factually incorrect information. While they manipulate symbols with remarkable fluency, it’s unclear whether they truly understand the meaning behind these symbols as humans do.
Yet these limitations drive further progress. The shortcomings of current models spur researchers to develop new architectures and training methods, pushing toward AI systems that reason more reliably, incorporate broader forms of knowledge, and perhaps even develop something akin to genuine understanding.
In this way, the development of AI follows the same pattern we’ve seen in human cognitive evolution and scientific progress: each advance in our ability to compress and manipulate information reveals new frontiers of complexity, compelling us to seek ever more sophisticated forms of compression.
Climbing the Ladders of Abstraction
The compression paradox offers a compelling lens through which to view human progress. At its core, cognition and intelligence can be seen as processes of compression—distilling the vast complexity of the world into manageable, meaningful representations. Yet paradoxically, the more we compress our knowledge, the more acutely we become aware of the gaps in our understanding.
This cycle of compression and discovery is the engine that drives us forward. As we distill our intuitions into rational frameworks, we build scaffolding that allows us to explore the unknown. Each new discovery, sparked by an intuition about a gap in our knowledge, opens up further questions, propelling us into an endless cycle of inquiry and understanding.
The evolution of knowledge throughout human history is a generative process where new concepts emerge from the compression of intuition into reason. These concepts become the building blocks for future knowledge, highlighting a fundamental aspect of our ignorance: we’re not just unaware of future events; we lack the very conceptual frameworks needed to comprehend future knowledge. We’re limited by the boundaries of our current understanding.
This phenomenon, sometimes referred to as "ignoration," describes our condition of not knowing what we don’t know. It’s a humbling reminder of the vast terrain of knowledge that lies beyond our grasp. Simultaneously, our historical knowledge is itself a form of lossy compression. As we attempt to reconstruct the past, entire "worlds" disappear, compressed into the limited frameworks of our historical narratives. We’re left with a paradox: our compressed understandings are necessary for making sense of history, yet they are inevitably incomplete.
The advent of artificial intelligence introduces a new dimension to this compression paradox. As we develop increasingly sophisticated AI systems, we’re essentially creating external models of our own cognitive processes. This offers a unique opportunity to study the nature of intelligence and consciousness from the outside, potentially leading to new insights into the fundamental nature of mind.
Moreover, these AI systems are accelerating the cycle between intuition and reason that drives human progress. By delegating certain cognitive workloads to machines—something we’ve been doing for decades—we enable ourselves to climb higher on the ladder of abstraction, developing new concepts and forms of knowledge that were previously out of reach. The next leap in AI reasoning capabilities may come from ideas like Hofstadter's parallel terraced scan, dynamically allocating computational resources based on the complexity of the problem, and by providing these systems with richer sets of tools to extend their cognitive processes.
This evolution in AI and cognitive science pushes us toward a deeper understanding of the entanglement between intuition and reason. Knowledge emerges from the gap between these two modes of thought, through the process of compressing complexity into language, abstractions, and symbols. This entangled process is the essence of human progress, driven by a quest for what we might call a "lossless compression" of the universe's complexity—an ultimate understanding that loses no nuance or detail.
Yet even as we strive for this ideal, we must recognize that some degree of lossiness is inherent and necessary in our conscious experience. Imagine a lossless history—a stream containing every detail of every moment. Meaning and narrative, so core to human understanding, would become incomprehensible amidst the overwhelming flood of information.
The very nature of consciousness might be understood as a compression process, distilling the overwhelming complexity of sensory inputs and internal states into a coherent, unified experience. Our inability to fully articulate or explain our conscious experience could be seen as a manifestation of this inherent lossiness. It’s both the source of human wonder and a reminder of the limitations we face in translating our experiences to others.
As we continue to develop more powerful tools for compression and knowledge creation, we find ourselves approaching new frontiers of understanding at an accelerating pace. Yet, true to the nature of the compression paradox, each advance reveals new depths of complexity. The bitter lesson of AI teaches us that embracing scaled compression over lossless—accepting the inherent losses in compression—can lead to breakthroughs that expand our capabilities. By navigating the tension between what we can compress and what remains elusive, we engage in the endless dance between intuition and reason that propels human progress.
This perspective aligns with thinkers like Stephen Wolfram, whose concept of the Ruliad—the abstract space of all possible computations—provides a framework for understanding the emergence of complexity and intelligence from simple rules and representations. In Wolfram's view, the universe itself can be seen as a vast computation, an unfolding of the Ruliad according to basic rules and initial conditions. Observers within this computation, including us, function as compression algorithms, distilling the overwhelming complexity into manageable, meaningful representations.
The parallels to cognition are clear: our minds are observers within the vast computation of reality, constantly compressing sensory data into mental models, concepts, and narratives. The quality of our compressions determines the quality of our understanding—our ability to predict and navigate the world around us.
However, Wolfram's theory also highlights the fundamental limits of any compression scheme. No finite observer, no matter how sophisticated, can fully capture the infinite complexity of the Ruliad. There will always be aspects of reality that escape our compressions, patterns and structures that lie beyond our horizon of understanding.
In navigating this paradox, we must strive to balance our quest for efficient representations with an appreciation for the irreducible complexity of human experience. The future of human progress lies not in simply accumulating knowledge but in the dynamic interplay between intuition and reason—between the uncompressed richness of raw experience and the compressed efficiency of symbolic thought.
There is a unity in the tension between the ineffable fullness of experience and our attempts to capture and communicate it through compressed representations. It's in navigating this gap—in the endless dance between intuition and reason—that the true story of human progress unfolds. A story that continues to evolve, promising to be more fascinating and profound than we can presently imagine.
I am grateful to Linus Lee, Ben Cmejla, and Andrew Steuer for their generous time and thoughtful comments on earlier drafts of this essay.
[1] This is one reason why I resist common references of AGI as a singular point in time, but rather an evolving pursuit of progress that gives us increasingly powerful tools of compression to manipulate knowledge in new ways and to deepen our understanding of the world. The faster and more effective, the more new questions will arise.