Anti-Parsimony: a lesson from AI

If Occam’s razor, the doctrine that the simpler theory is more likely to be correct, proves to be frequently wrong, then what are the implications for other theories of knowledge in philosophy?

In part one of this essay, I gave an extensive example of how a tenet of pseudoscience can be simpler than a tenet of actual science, making the pseudoscience seem to trump real science according to Occam’s razor. I also suggested two types of ways in which Occam’s razor could prove inadequate. One is that context counts, but always privileging the simplest theory has the effect of ignoring elaborated context. And the second manner of failure, as discussed in part one, is that always choosing the simplest explanation can overlook the contribution of intermediaries which can be complex rather than simple.

Here in part two, I will introduce a third general way in which Occam’s razor can fail, as well as provide more examples of its failure so as to illustrate how pervasive, rather than exceptional, are those failures. Then I will get on with the main business of this second part of the essay, which is to examine how those failures impact other theories of knowledge.

third example

The third manner that I will discuss here of how Occam’s razor can fail is with the addition of new information. In daily life, an example is how the simplest explanation for how our parked car became dented is that the neighbor’s teenager swiped it while driving wildly. But then there is new information that the teenager has been out of town the entire time. The new information contradicts the simplest explanation.

In science, an example is the invention of new technology which creates new information, such as how the invention of the microscope led to the germ theory of disease. Puerperal fever, which follows childbirth, was once thought to be caused by childbirth (that was the easiest assumption). But now it is known to be an infection related to streptococcus.

Other examples of how Occam’s razor fails include how bloodletting and the theory of keeping one’s humors balanced is simpler than describing all of physiology and how multiple factors are fitting together to show bloodletting to be calamitous. Also, Darwinism is more complicated than Lamarckism. And a theory that some chemical reactions will only happen in the presence of a catalyst which is not itself consumed is more complex than a theory that a reaction occurs only among the reactants. The theory that the moon crashed into the earth and then separated again is more complicated than that that never happened. And believing that the continents slide along on tectonic plates is more complex than the simple assertion that that does not happen.

In general, science often works by citing mechanisms, yet almost by definition mechanisms are more complicated—they consist of multiple steps—compared to just making simple statements.

So given the frequent failures of Occam’s razor, let’s look at what that means for various other theories of knowledge espoused in philosophy.

parsimony

A slightly more sophisticated way of stating Occam’s razor is with the principle of “parsimony.” It is most strongly associated with the philosopher Karl Popper (he of falsifiability fame), and all by itself parsimony merely means that a theory has but few underlying theoretical considerations such as postulates, assumptions, or dependence on other theories. More usually, however, it is paired with the quality of being able to explain a lot of phenomena. So altogether, parsimony is when a simple statement can explain many observations.

And that is good, of course. It is tidy. But is it necessary in order for a theory to be considered a science? Or put another way, does having parsimony make a theory “more scientific,” as many contend?

Parsimony has been popularized, for instance (not necessarily the word but the sentiment) by the influential physicist Sabine Hossenfelder whose online essays explaining physics are deserving of their strong following and popularity. But when it comes to the philosophy of science, she is all Popper (which opens her to criticism).

Hossenfelder contends that a theory exhibiting parsimony is not only a better theory but that the demarcation between science and pseudoscience is indeed parsimony. (Compare that to thinking that science is about experimentation and about what can be demonstrated).

Writes Hossenfelder on the subject of the demarcation of science with pseudoscience,

“If you have a model that requires many assumptions to explain few observations, and if you hold on to that model even though there is a simpler explanation, then that is unscientific…. The scientific explanations are the simple ones, the ones that explain lots of observations with few assumptions.”

As for Popper himself, he links simplicity with true science by arguing that simpler theories are more easily falsifiable than complex ones.

But are they? What is the justification for these views?

A few more counterexamples are in order, especially from physics, and then I will get into the discussion of why parsimony, like Occam, fails.

So compare the Bohr model of the atom (with electrons circling the nucleus like planets around a sun), and the quantum mechanical model (with electrons appearing in clouds of probabilities, or more exactly, in the square roots of probabilities). It is hard to contend that the quantum model is more simple, and yet it can accurately explain many more observations.

Or consider relativity theory. With its descriptions of time dilation and length contraction (so that time moves at a different speed at the top of a skyscraper than at the bottom), it is hardly simpler than Newton’s absolute description of time and space. Yet, again, Einstein’s complex theory can explain many more observations than can Newton’s simpler one.

So then why should it be that parsimony is not confirmed?

a lesson from AI

We can get a few hints from other sources, besides the failures of Occam’s razor.

One is by considering the supercomputer in Douglas Adams’ Hitchhikers’ Guide to the Galaxy. It is asked to use its superpowers to find the answer to life, the Universe, and everything. So it churns and churns, and after years of computing it finally comes up with the answer.

The answer is “42.”

And sure, that is simple, all right. But it is humorous because it is utterly useless. It tells us not even a hint. To be useful, we need to know what went into deciding that the answer is 42. It is a distillation of what considerations? We need to know the context, not just the number. (That is why science students learn to duplicate entire derivations, not just the results of the derivations).

And so it is with parsimony in comparing theories. To be useful, a theory must let us know what considerations are being brought together in coming to an answer, even if that means being more complicated than just privileging the simplest formulation.

And it turns out that this notion that answers lose their meaning when they are oversimplified (a principle that we might call “anti-parsimony”) has itself been put to the test as experiments in artificial intelligence (AI).

There are those who believe (especially Platonists) that, just from knowing the pure essence of a thing, we should be able to deduce the rest of what we need to know to make use of it. Just from knowing a fact, a basic equation, or an ultimate truth, we should be able to figure out anything else having to do with it.

And that belief is what has been (perhaps unintentionally) put to the test.

The early researchers in AI just assumed (in an unexamined fashion) that parsimony was correct, so they further assumed that, in order to enable a computer in a robot to successfully navigate the world, all the researchers had to do was to equip that computer with some basic equations and let the computer deduce the rest of the details from those equations.

So imagine saying to a robot’s computer, “Here you go, Computer. F = ma. E = mc2.  Now go to the next intersection and decide whether to turn left or right.”

Did it work?

Of course not. But at the time, there really was a lot of surprise at making this discovery because the AI researchers really did just assume that parsimony was the correct way of the world.

So some of the AI researchers decided to try giving the robot’s computer a few more particular bits of information, since it was obvious (here is another way of stating anti-parsimony) that the particulars could not be derived from knowing the simple general statements alone. It does not work like in logic where knowing a generality or a category lets you deduce the particulars.

Here is an example of the problem. They could not tell a computer that two people were brothers and expect it automatically to know that therefore the brothers would, among other things, age at the same rate; the computer could not make its deductions without being explicitly told this extra information. And always there were at least some particulars like that which could not be simply deduced from knowing only a simple general statement. And that was preventing success.

After they got up to about 10 million of these little particular facts that they fed the computer and still there was no success, they decided that it might take 100 million little particular facts for any kind of success, and they gave up.

The AI researchers decided to call it “common sense” that people have, but which computers do not have, and which enables people’s minds to understand simple statements. It is common sense which provides the necessary context, derived from knowing how myriad details fit together, that gives meaning and utility to an isolated simple statement.

So when Taubes (from part one) argues that science is wrong to take a “multifaceted” approach to knowledge and that it should, instead, as per Occam’s razor, take a parsimonious view, he is basically saying that all of these extra little bits of knowledge that the AI people find are needed for “common sense” should be cut out.  But they cannot be ignored.

And it demonstrates how knowledge is about recognizing how things fit together, not just about what can be derived from a few simple basics that are deduced from.

Also, I should add that that is consistent with a theme I have been making in these posts, which is that the world is made of energy, and energy makes arrangements of itself. And it is because the world is arranged that it is about things fitting together.

In other words, there is a physical basis for the observed failures of parsimony.

Also, this discovery of what constitutes common sense in AI can inform our understanding of how knowledge is organized in humans. When we consider the situation at hand, we do so by seeing it in terms of how it fits with vastly many other situations. This fitting together is different from seeing it organized hierarchically with simple things collected into categories and generalities (although that can happen, too).

This story of the AI research can be found in Michio Kaku’s book Visions (1997), pp. 60-98.

Platonism

Then without belaboring the point, it becomes clear that the above experiments also constitute one more argument against understanding science in a Platonic fashion. Platonism is a belief that the world is made of reflected pure essences which “are what they are” regardless of anything else. But these experiments reveal a world where context definitely counts for making a thing be what it is. So the world does not seem to be about reflected pure essences and about what can be deduced from knowing them.

falsifiability

As for Popper’s contention that simple theories are more easily falsifiable than are complex theories, and so are more scientific, that is, again, best addressed with the counterexamples. Relativity and quantum theories are not simple, and yet they are very capable of generating specific predictions that can be submitted to experiment. For instance, it is common to calculate a simple result, then find that it does not agree with observation, and then realize that we have to “correct for relativity.”

Complex theories are very easily tested to see if they work out.

Perhaps a better question is to ask if falsifiability as a doctrine is itself compatible with common sense as the AI people define common sense. It is uncontroversial that, to be scientific, a thesis must be capable of being submitted to experimentation to see if it works empirically. But the way that Popper has formulated this principle (stating it in terms of true and false and complex and simple), opens up a lot of unnecessary—and pretty much unexamined—extra issues.

Why not just say, for instance, that a scientific thesis must be “demonstrable,” rather than “falsifiable”?

But that is a different big subject for another discussion.

demarcation

Finally, when it comes to the issue of how to discern science from pseudoscience, it seems best not to automatically assume that the simplest theory is the most scientific. There are too many counterexamples, experiments, and arguments against that for it to be a helpful criterion.

Yet please do not misunderstand me. I am not saying that science cannot make simple-appearing statements. But look at E = mc2. The reason it is such a meaningful equation is that we already know what E, m, and c stand for. And we understand them in terms of already knowing what velocity is, and velocity in terms of meters and seconds. Science is a vast arrangement of knowing one thing in terms of others.

So it is in light of what we already know that E = mc2 is such an insightful equation. We can start to re-understand what we already know in terms of it.

But the reason it is science is not because it has only a few number of elements in the equation.

And what the AI researchers discovered about the nature of knowledge is that what we already know is an essential component to using the equation. Yet just from seeing the equation alone,, we cannot deduce the rest of what it stands for (if we did not already know).

So in answering why parsimony often fails, it is because it cuts out (often deliberately, as per Taubes) the very context that is essential for human understanding.