Chapter 5. Science

Let’s go back to Dartmoor, to the Sherlock Holmes story Silver Blaze. Remember when inspector Gregory showed Holmes and Watson the scene of the murder? Holmes found a small but telling piece of evidence that the police had overlooked. He then told Gregory that he would need a little more time to look around. Holmes wasn’t being entirely candid. He needed more time, but not to look at the murder site. He needed more time to find the missing horse.

I said previously that Holmes and Watson search for the horse using the “scientific method.” What did I mean? Look around you. I’m confident that almost everything you see is a product of science and its brother-in-arms, technology. In other words, a product of the “scientific method.” You rely on it every day. But can you explain it? Can you apply it yourself to your own problems? Science is the ultimate hack, the hack that can generate every other hack. How does it do that?

The usual explanation of the scientific method is that it consists of hypothesis and experiment. We see this clearly in the Sherlock Holmes story. The ground near the murder site was quite dry and hard. Gregory had searched for tracks but found none. Which way had the horse gone? Holmes applies his imagination to the problem. “See the value of imagination,” says Holmes. “It is the one quality which Gregory lacks.” Holmes looks at the rolling brown landscape and he imagines the horse running off on its own. Which direction seems most plausible? In the scientific method this imaginative step is the hypothesis. It’s a hunch, an unproved speculation. The next step is to prove it true or false. We need an experiment.

Holmes and Watson set off across the moor, following the path of their imaginary horse. If the hypothesis is correct, they might find evidence to confirm it. They walk away from the murder site, far beyond the ground searched by the police. They come to a depression in the moor. Today the ground is baked hard by the summer sunshine, but on the night of the murder rain was streaming down. This ground would have been soft and muddy. They cast around and soon find hoof-prints. Holmes checks one of the hoof-prints against one of Silver Blaze’s horseshoes, given to him earlier by Gregory. The horseshoe matches. The experiment has proved the hypothesis. “We imagined what might have happened,” says Holmes, “acted upon the supposition, and find ourselves justified.” A triumph for the scientific method.

Hypothesis and experiment. Is that it? Where exactly is the hack? Is it in Holmes’ special powers of imagination? But then how can we learn to do that for ourselves? What if we are not as special as Holmes? Or maybe the key is looking at things? But that seems rather mundane, doesn’t it? Surely there’s more to science than lucky guesses and looking to see the if guesses are right? Well, there is more to science than that, and I’m going to explain it to you, but first it’s important to realise that even scientists often don’t realise how science works. They can be good at doing science without knowing how it works or why it works. Instead they often believe in a distorted image, a caricature of science. There are two especially popular distorted images of science. As scientific historian Simon Schaffer explains:

One image is that scientists are absolutely special people; that they’re much more moral and much more virtuous and much much cleverer and that they do things nothing like what anybody else does. On the other hand, there’s an equally powerful public image of science, which is that science is organised commonsense, that it’s just cookery raised to a fairly sophisticated art. Those are the two dominant public images of science in our culture, and neither of them is right. note 50

Schaffer and his colleagues in the “Science Studies” movement came to this conclusion after studying scientists in their natural habitat. They wanted to know how science really worked, what scientists really did in their day-to-day lives. So they took the methods invented to study tribes in New Guinea and used them to study scientists in research labs. They went back and looked again at historic discoveries. What did scientists actually say and write beforehand? What really happened?

Outsiders studying science had previously just asked scientists what they did and assumed that their reminiscences were accurate. They hadn’t actually been to look, to see what really happened in practice in laboratories and field stations. When they did look, they discovered that the reality was quite different from the two views of science described by Schaffer, views very popular with the scientists themselves.

You might imagine that scientists would be pleased to know how science really worked. But no. Instead, for most of the 1990s, the supporters of those two popular images of science attacked the historians and anthropologists who were studying them. This feud became known as “the science wars.” As Schaffer recalls with a lingering bitterness:

It was a very unpleasant period indeed. Several of my friends either lost their jobs or were threatened with losing their jobs, on the grounds that they were the enemies of the sciences. note 51

The main thought-crime of the historians and anthropologists was to suggest that science was “socially constructed.” This really touched a nerve. The self-appointed defenders of science said that something was either true or it was socially constructed. It couldn’t be both. Science was was all about objective truth. People who suggested that there was a social aspect to science must therefore be Post Modernist wreckers. They must be denying the reality of objective truth. Schaffer and his colleagues were denounced as “constructivists.” This was especially ironic because they were self-consciously trying to be scientific in their investigation of science, trying to draw conclusions based on real-world observation.

Schaffer’s particular sin was to have written, with Steven Shapin, a book about science in seventeenth century England. In Leviathan and the Air-Pump they investigated a long-forgotten controversy, an argument which started in 1660 between Robert Boyle and Thomas Hobbes. Most likely, the argument was forgotten because Thomas Hobbes, political theorist par excellence, was on the wrong side, the losing side. He argued against Boyle’s experimental method, the method which became the cornerstone of modern science. However, by putting the argument in its historical context, Shapin and Schaffer demonstrated that Hobbes’ concerns were entirely valid at the time. And not only that, Hobbes’ old objections raised disturbing questions about the two popular modern-day images of science. note 52

Here is the story of Boyle and Hobbes in a nutshell. Boyle was a wealthy aristocrat, a gentleman with a keen interest in science. He is today regarded as the “father of modern chemistry.” In 1660, he published a book describing experiments with an “air-pump,” a device for removing the air from a container, leaving a vacuum. At the time there were only a handful of air-pumps in the whole of Europe. Boyle’s air-pump had been constructed with help from his talented friend Robert Hooke, and in fact Boyle often left the operation of the temperamental air-pump in Hooke’s reliable hands.

The air-pump was central to Boyle’s new idea of how to plumb the mysteries of nature. Before that time, natural philosophers had tried to understand nature by looking at what usually happened when things were left in their natural state. Boyle had the novel idea of interfering with nature using a specially constructed engine, to see what happened when nature was pushed into an extraordinary state. Schaffer says that Boyle’s genius was to realise that “if you want to know about the properties of something, it’s a really good idea to get rid of it, and then see what difference that makes.” The idea seems obvious to us now, but it was brand new then. note 53

Boyle put all kinds of things in the air-pump. He found that removing air had the effect of extinguishing flames, and also of extinguishing the life of animals. This seems to us rather cruel and pointless, but Boyle was trying to resolve a current scientific mystery. Thirty years before, William Harvey had published his findings about the circulation of the blood in animals. Harvey demonstrated that the blood was pumped, again and again, around the body and though the surface of the lungs. But why? Something happened in the lungs and Boyle was attempting to find out what.

But Boyle was attempting to do more than this. He put forward his new method of “experimental philosophy” as the best way to find out the truth and to resolve disputes without conflict. This sounds reasonable enough, but at the time these were big claims. In Europe, the appallingly destructive Thirty Years War had ended only a decade or so earlier. The English had shockingly executed their own king Charles I at about the same time, the culmination of the English Civil War. This was followed in England by a humourless military dictatorship under Oliver Cromwell, who even banned carols and mince pies at Christmas. Only in 1660 was England restored to a version of the previous order when king Charles II was crowned king. In a seventeenth century version of “truth and reconciliation,” the new king gave an amnesty to all participants in the Civil War, with the understandable exception of the few men who had signed his father’s death warrant.

In this context, Boyle’s claim that “experimental philosophy” could resolve disputes without conflict was a big one. How was this supposed to work? Boyle’s method was to perform his experiments in front of respectable witnesses, in a sense in public. When they saw what happened, they would agree on what they had seen and Boyle would write it up in a style that allowed his readers to imagine in their mind’s eye that they had also been “in the room” and seen the same thing. The respectable witnesses who really were in the room would then back up Boyle’s testimony in writing.

In this way, Boyle’s readers could be assured that the experimental observations were authentic. They could be confident that, if they accepted Boyle’s invitation to repeat an experiment themselves, they would see the same thing. By following this process, Boyle considered that the observations he made were established afterwards as “a matter of fact.” Even though in practice almost nobody could repeat his experiments, due to expense and technical difficulty, nevertheless they could rely on these “matters of fact” and use them to deduce the general rules of the natural world. Not only that, said Boyle, but the same method could be used to settle wider disputes, because it was a general method for establishing undeniable truth and allowing people to reason about the world without resorting to violence. Hobbes thought this was complete nonsense.

Born in the year of the Spanish Armada, Thomas Hobbes was in his seventies in 1660. He had seen the disasters which had fallen upon Europe in his lifetime, the consequences of disputes resolved with a lot of violence. He was as keen as Boyle to find a way of resolving disputes without violence. Today he is best known for his theories of government, which still underpin western political thought. In his own lifetime he was renowned for his wider scholarship in the sciences and arts. He also had a reputation as a bit of a heretic, perhaps even an atheist. (Despite this, the new king Charles II had a soft spot for Hobbes, who had once been his maths tutor, and gave Hobbes a pension.)

Hobbes had two objections to Boyle’s “experimental philosophy.” Firstly, he said that Boyle was wrong to say people would agree just because they were shown something. Hobbes said that in his own experience, when people really had something at stake then they wouldn’t agree. Merely showing them something wasn’t enough to make them change their minds. Boyle’s method could not compel agreement, so Boyle was wrong to claim that it would guarantee agreement.

Secondly, Hobbes said Boyle was wrong to claim that an experiment done on one occasion in one place could ever prove a general rule. Even if the witnesses did agree about what they saw, one experiment only proved what happened there, not everywhere. Experiments, even repeated experiments, could not compel agreement about a general rule, because a dissenter could say “Show me it happening everywhere. I saw it here and there, but not everywhere. That doesn’t prove the general rule.” Again, said Hobbes, Boyle was exaggerating the power of his method when he claimed that it could establish undeniable truth.

Hobbes was not making these arguments fresh for himself, but rather echoing two thousand years of philosophical objections to observational science. Hobbes was steeped in classical learning. (For example, he made the first translation into English of the History of the Peloponnesian War by Thucydides.) He knew all the old arguments, and he was just wheeling them out against a new opponent.

The classical, Platonic, view of reality was that there were two worlds. The visible world was more or less an illusion, fickle, dynamic, in flux, but behind this there stood a logical, non-material unchanging reality. For example, a particular oak tree might change from day to day, from season to season, never quite the same. But behind it there was an unchanging ideal oak tree, always the same. In this classical view, the route to knowledge must necessarily turn away from the deceptive physical world and instead focus on the logical reality behind the scenes. Hobbes thought that pure logical argument was the only reliable path to true knowledge, as demonstrated for example in the classical mathematics textbooks of Euclid.

It’s difficult for us to send our minds back to the time of Boyle and Hobbes. We can see that Hobbes and the classical philosophers made an enormous hidden assumption that human intellect runs parallel to the hidden logical reality, without any physical link between them. They assumed that general rules can be simply recalled to mind, because the mind is itself part of the ideal, behind-the-scenes reality. We find that hard to believe. We take Boyle’s experimental method for granted. It seems obviously the right way to establish the truth.

Shapin and Schaffer tried to tell the story even-handedly, and for doing so they were vilified as being “anti-science.” Although we don’t agree with Hobbes’ classical philosophy, his objections are very substantial, and we can’t discount them just because science had such triumphs in the following centuries. Hobbes’ objections do undermine the foundations of science, so we can’t just brush them off. We must have answers, and when we do have answers we will understand how science, the ultimate hack, really works.

Let’s start with Hobbes’ first objection. Why should witnesses agree? And even if they do agree with each other, why should we agree with them? Boyle was eager to emphasise that his experiments were done in public, not furtively in secret like an old-time alchemist. Hobbes pointedly asked whether anyone could come and watch, knowing full well that that the answer was “no.” They were not actually done in public, but rather in front of a public, constituted from Boyle’s friends in the Royal Society.

The Royal Society, the oldest scientific society in the world, was established under the patronage of the new king Charles II. It was self-consciously modelled after the fictional national college of science in The New Atlantis, a science-fiction book written 40 years earlier by former Lord Chancellor Francis Bacon. In the book and in reality, this college had a wider political role, pursuing scientific discovery not just for its own sake but also for the benefit of society. But such noble aims did not change the fact that it was in practice a gentleman’s club. The Royal Society has been described as open to the public in the same sense that the Ritz Hotel is open to the public. You were quite welcome, provided you looked respectable and had enough money.

So, if you couldn’t go and watch for yourself, you would have to rely on the accounts of these gentlemen. But as we have already seen, and Hobbes would be keen to remind us, people make very unreliable witnesses. They are easily fooled and even when they see the same thing, they don’t recall the same thing afterwards. Why believe them?

The answer is reputation — or in other words, our moral value of authority/respect. James Boyle and his witnesses “in the room” could be believed because they were gentlemen. Why would Boyle, son of the Earl of Cork, lie about his air-pump? And why would his gentlemen friends support him if he lied? They had reputations to preserve. Hobbes found this answer very unsatisfactory, and never accepted it as a reliable foundation for establishing truth. Despite that, it really is our modern answer to Hobbes’ first objection. Truth in science is “socially constructed” in the sense that scientists have constructed a communal system for reputation management. This system is very much founded on the moral value of authority/respect.

Most people think that scientists are unusually sceptical, that they take nothing for granted, that they insist on everything being proved to them. Nothing could be further from the truth. When the anthropologists went to the research labs and studied scientists at work they discovered that the key feature of science was the tremendous amount of trust. Scientists use equipment made by other people, and they trust that it is calibrated correctly. They use materials made by other people, and trust that they are formulated correctly. They use data gathered by other people and trust that they have been collected correctly. This tremendous amount of trust is for the most part warranted, because of the largely unnoticed social system for reputation management. Scientists have developed a social process for calibrating not just their laboratory equipment but the trustworthiness of other scientists, and then relying on that trust to support work over decades of time and thousands of miles of space. Science works because scientists continually re-calibrate their trust for each other.

How does this work in practice? Scientists working in the same field of course know each other, gossip about each other. They also publish their work in journals and at conferences, just as the early members of the Royal Society published their findings and theories. They have reputations to build and to preserve. They adopt a style of writing that takes care to acknowledge previous work, to establish that they know and respect those who trod their path before them. Mistakes are embarrassing, so people try hard to remove sources of error from their experiments. Rather than rely on personal observation, they prefer to build experimental engines that make measurements automatically. Scientists are determined at all costs to avoid the perception of fraud. Mistakes are understandable, but outright lying is a mortal sin. So, the truths established by science are objective, but this system of reputation management means that they are still “socially constructed.”

This social system of reputation management works excellently most of the time. Scientists make much faster progress when they don’t have to check everything for themselves. Instead they can plug into the network of trust and build on a pretty solid foundation. If you want to work effectively in a scientific field, you also need to plug into this network, or at least be aware that it exists. As a hacker, of course, your own reputation in this network will be close to zero. However, as an outsider, you can still use the scientific network of trust. When you read someone’s work, ask yourself first of all, what’s their reputation? Is it flaky? Reliable? Reactionary? Who are their friends? Their rivals?

Hobbes was right to be worried whether the witnesses to an experiment were disinterested observers, whether they would act like gentlemen. Scientists are not special people, they are not morally superior. As a society, we rely on their network of trust just as much as the scientists themselves, and the price of science is constant vigilance. Because of the network of trust, the results of science are mostly trustworthy. But not always. Do we really trust experiments run by drug companies? No, not entirely, not when they can decide what to publish and what to withhold. Do we really trust the claims of GM crop manufacturers? No, not entirely, not when they can veto the publication of unfavourable trials. Often nowadays we see medical scientists selling their reputations for money when they put their names to biased articles ghost-written by drug companies. Sometimes we even see scientists attempting to crush weaker rivals through the network of trust, undermining their reputations rather than disproving their theories. note 54

Since you are more likely to be on the outside of a scientific network of trust than on the inside, what can you do to establish the truth for yourself if you are dubious about the scientists’ conclusions? If you can, the best thing would be to understand the technical arguments and establish for yourself whether their data justify their conclusions. We will come back to this approach later, when we address Hobbes’ second objection. However, the problem with understanding the technical arguments fully is that it might take a very long time. It might take years. Are there any techniques to triangulate on the truth from a distance, by using information from the network of trust?

Legal scholar Eric Johnson has made some useful suggestions in his article The Black Hole Case: The Injunction against the end of the world. This article deals with legal and scientific arguments about the safety of the Large Hadron Collider (LHC) at CERN on the Swiss-French border. This is one of the largest scientific programmes in history, with a staff of around 10,000 scientists and a construction budget of at least 10 billion Euros. It is an amazing scientific engine, a very distant descendant of Boyle’s air-pump. The snag is that it has a small chance of destroying the whole world. How small depends on the precise details of the particle physics theories which the LHC is itself designed to investigate. Only the insiders at CERN are in a position to really know what the risks are. note 55

They said that it was entirely safe, but outsiders challenged this claim and attempted to get court injunctions to stop the LHC being operated. There was a lot at stake. On one hand, the whole world; on the other hand, the future of all the physicists at CERN. While it was not likely that the LHC would destroy the world, it was certain that if the LHC was stopped, all those physicists would be out of a job. This led to some rather intemperate remarks. For example, physicist Brian Cox said “Anyone who thinks the LHC will destroy the world is a twat.” I hardly have to point out that this is not a scientific argument, but a reputational attack.

Johnson’s article uses the LHC as an example, but he is really more interested in exploring how courts might deal with similar “end of the world” cases in the future. In that future case, whether it be nanotechnology, asteroid mining or whatever, how could a court break a log-jam of self-interested experts and how could it deal with arguments that were too technical to properly understand? Johnson suggests that we look for the following four defects:

Defective theoretical groundings. Sometimes, scientists think they know the answer, but their answer is wrong. A theory which seems obviously true in one era may look hopelessly naïve in another. We should be especially dubious when insiders change their argument again and again in the face of contrary evidence, so that they can maintain their original conclusions. In the case of the LHC, the claims about its safety were suspect precisely because insiders were forced to change their argument several times as flaws were exposed.
Faulty technical work. When scientists need to draw extraordinarily reliable conclusions, they need to rely on extraordinarily careful work by other scientists. However, the network of trust might fail — those other scientists might not have taken extraordinary care in their work, because they didn’t expect that much would depend on it. The other work might contain small-scale errors, such as miscalculations, inaccurate observations or flawed assumptions. As outsiders, we might still be able to check, or estimate, whether enough care was taken in subsidiary work. As outsiders we can also consider how many assumptions were needed to draw the conclusions: more assumptions means more room for error. The safety arguments for the LHC not only made a lot of assumptions, but depended on observations of a handful of stars made by other scientists. It is unlikely that the scientists observing those stars took extraordinary care with their observations, since they could not know that the fate of the world might depend on them.
Credulity and neglect. We have already seen many factors that can lead to groups of people making innocent and unintentional mistakes. For example, people see what they expect to see, and suffer from confirmation bias. “Reputational cascades” endow conclusions with more and more confidence each time they are passed on. Moral arguments can override logical thinking. Ingroup/loyalty in particular leads to “groupthink,” where group members value consensus over truth. As outsiders we can see some of these effects more clearly than insiders. Hostility to outsiders, such as Brian Cox’s remark about critics of the LHC, is a bad sign.
Bias and influence. The final defect is that people can deliberately make errors, because of self-interest and ambition. As outsiders we can try to use the network of trust in reverse, to find conflicts of interest. We might not have to look far. The defenders of the LHC had numerous conflicts of interest. Insiders may also be prone to hopeless resignation: they may conclude that they are powerless to change things, despite being convinced that things are wrong. Or they might decide that if they succeed it will be just as bad for them — the whistle-blower is usually punished by those who have lost out, even when they do succeed in changing things. We cannot, as outsiders, rely on the absence of whistle-blowers to prove that all is well on the inside.

Science has been so successful, has expanded so much, that other institutions now use science’s cloak of respectability to further their own aims. This dishonesty is not lost on the public, and perhaps explains the current public disillusionment with science as a whole. It looks to the public as if some of these “gentleman” scientists have sold out. Which they have. But this disillusionment is short-sighted — we need science now more than ever, and rather than turning our backs on it, we need to revitalise the network of trust that makes science work. We need to disentangle money and truth. We need to make coercion, bribery and bias more obvious — libel lawsuits and copyright injunctions have no place in the conduct of science. But perhaps most of all we need to understand the limits of science. We need to understand that scientists are not the high priests of truth and that the results of science are always uncertain and provisional. Which brings us back to Hobbes’ second objection.

Hobbes said that no single experiment or even a series of experiments could establish the truth of a general rule. You can prove a fact true using an experiment, but it’s harder, usually impossible, to prove a general rule using an experiment. For example, Holmes conjectured that the horse Silver Blaze was at a particular place nearby on Dartmoor. Holmes and Watson then conducted an experiment. They found hoof-prints and then they found the horse. They could prove their conjecture because it wasn’t a general rule, it was a hypothetical fact. They just had to demonstrate that it was actually a true fact.

While they were doing this, inspector Gregory remained unaware of what they had found, and still believed in the general rule “Silver Blaze is absent from every place on Dartmoor.” Gregory had looked in several places on the moor around the murder site. He hadn’t found hoof-prints and he hadn’t found the horse, so he concluded that Sliver Blaze was not anywhere on the moor. We can clearly see in this case that he was wrong. The essence of Hobbes’ objection is that all arguments of this kind are wrong. Unless you look everywhere, and establish the facts about everywhere by observation, then you haven’t proved a general rule which claims that something is true everywhere.

The technique of looking everywhere is more practical these days than it was in Hobbes’ time, and we do sometimes use it in practice. Computer hackers call this technique “exhaustive search.” You need a problem where a computer can do all the experiments for you, and do them in a sufficiently short time. Before computers, most forms of exhaustive search were impractical. Even now, most general rules are not candidates for exhaustive search, so the only way to prove a new general rule using logical argument is to derive it from an existing general rule which you already know to be true. Hobbes was right.

This fatal flaw in the experimental method didn’t kill science in the seventeenth century. Science was inventing useful new general rules. People not only understood more about the world, they were making money. Hobbes’ objections were forgotten. Science continued for several hundred years, like one of those cartoon characters running off the edge of a cliff, not noticing that they are supported only by air. Finally, in the twentieth century, the embarrassment grew too much, and some attempts were made to underpin the enterprise with more respectable logical foundations.

The best known attempt from the mid-twentieth century was made by philosopher Karl Popper. Popper’s idea of falsifiability is still considered the acid-test of whether a theory is properly scientific. Popper said that to count as scientific, a theory had to be “falsifiable.” By this, he meant that it wasn’t important that a theory couldn’t be proved by an experiment, provided that in principle it could be disproved if it was wrong. So Inspector Gregory’s theory that “Silver Blaze is absent from every place on Dartmoor” is perfectly scientific. It’s just that it’s wrong. Holmes falsifies the theory by finding the horse.

Popper’s idea has a sort of Darwinian flavour to it. People create many theories to explain the world. People attempt to disprove the theories by experiment. Most theories are falsified. A few remain. The remaining theories are accepted provisionally. It’s a sort of “survival of the most plausible,” but no surviving theory is actually proved true. Falsifiability has been a popular idea, but it has some subtle problems. When experiment goes against theory, it is seldom clear exactly which part of a theory is wrong. Popper thought that it was better to drop the complex parts of a theory and keep the simpler, more general parts. This corresponds to the medieval idea of Occam’s Razor, the principle that simple, general explanations are to be preferred over complex, specific explanations. The best theory is a simple theory that explains a lot. But why? Popper could not bring himself to say that simple, general theories are more likely. Popper agreed with Hobbes that although disproved theories are false, the other surviving theories are no more true than they were on the day we first thought of them. How could a theory be more likely and yet no more true? There seems to be a paradox here.

Happily, we now know the answer. There is no paradox. But the answer wasn’t accepted easily by scientists. In a curious parallel with the “science wars” fought by historians and anthropologists, the mathematicians and physicists who invented the new logical foundations for science had to fight to have their ideas accepted. The fight was at times very acrimonious. Although the key work had been done in the 1940s and 1950s by Harold Jeffries, Richard Cox, Claude Shannon and George Pólya, the fight ground on through to the 1990s.

The solution was to redefine what we mean by the terms “true” and “prove.” This was the step which many scientists found unpalatable. A sequence of confirming experiments does not make a theory any more true using standard logical arguments. But suppose you were a bookmaker, taking bets on the outcome of the next experiment. What odds would you offer? As each experiment failed to disprove the theory, I think you would be happy to offer longer and longer odds on the next one disproving it. In the rather strange sense of changing the odds for bets on the theory, confirmatory experiments do make the theory more and more true. For a long time, this approach, called Bayesian reasoning, was regarded with deep suspicion. In recent years, the idea of using betting odds for reasoning rather than the true and false of classical logic has finally become respectable. note 56

Bayesian reasoning contains the earlier classical logic as a special case, the case where we know that something is absolutely true or absolutely false. Bayesian reasoning can also work with facts and general rules which are not completely certain. Given the odds for each fact and general rule, Bayesian reasoning can combine them and calculate the odds for our conclusion. That conclusion is still uncertain, but we are able to put a precise number to that uncertainty. For example, we might establish that the odds of a particular conclusion being wrong are 1 in 1000. Depending on what is at stake, we might decide that we have effectively “proved” that conclusion and that it is “true.” True enough, that is, for us to make a decision, true enough to place a bet, but not true in a sense that would really satisfy Hobbes or Popper.

Thick textbooks have been written about Bayesian reasoning, but don’t worry. I’ll show you how Bayesian reasoning works in only a few pages. Let’s start with a problem that Hobbes or Boyle might have met in a London coffee-house in 1660, the problem of how much to charge for marine insurance. Let’s say the owner of a ship comes to you and wants to buy insurance for their next voyage. Most ships return safely, but there are some losses from storms and piracy. The owner essentially wants to place a bet that their own ship will be lost, as a hedge against that disaster. How do you set the odds?

Let’s use the standard hacker’s technique of first solving a problem which is related but simpler. The insurance problem is a bit like being a bookmaker in a gambling game where you can’t quite see the operation of the game. For example, suppose you have a friend who is playing a gambling game on the other side of the coffee house. We’ll let him call out to you the results of each game. When your friend wins, he calls out “one,” when he loses he calls out “nought.” Here is what he calls out in sixty games:

010011001101011000110100001100011001111111001000111011111110

Now, say someone comes to you and wants to place a side-bet on your friend. What odds should you set? What’s the chance that he will win his next game? The data look quite random. (There are a couple of runs of seven ones in a row, but this exactly the sort of thing you expect from time to time in truly random data.) Out of 60 games, he won 33. So, your best estimate for his chance of winning is 33 in 60. You can’t expect to have exactly the right odds, and you would need a lot more data to come up with a better estimate. To see what difference more data would make, let’s say that we collect the same data every day for a year, and each day put an “x” against the number of times out of 60 our friend won on that day. Here is a year’s worth of data:

15
16
17
18	x
19	x
20	x
21	xx
22	xx
23	xxxxx xx
24	xxxxx xxxxx xx
25	xxxxx xxxxx xxxxx
26	xxxxx xxxxx xxxxx xxxxx
27	xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxx
28	xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx
29	xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx x
30	xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx x
31	xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx x
32	xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx
33	xxxxx xxxxx xxxxx xxxxx xxxxx
34	xxxxx xxxxx xxxxx xxxxx xxxxx xx
35	xxxxx xxxxx xxxxx xx
36	xxxxx xxxxx
37	x
38	xxxxx xxx
39	xxxxx
40	xx
41
42	x
43	xx
44
45

If you turn the page sideways you get a sort of mountain shape, rising out of a very flat plain. (To save space I left out the lines for less than 15 wins and more than 45 wins. They are all empty.) With this much data we can see that the true odds must be close to 30 in 60, which we might equivalently call 1 in 2, or a probability of ½ or a 50 percent chance of winning. The mountain shape, which we call a probability distribution, shows that there is a lot of variation around the average of 30 wins each day. To turn a profit we would have to set odds that were slightly more in our favour than 1 in 2, and even then we would risk losing money in the short run if our friend had a winning or a losing streak.

Many probability distributions have this same shape, mountains rising sharply out of a very flat plain. This shape is so ordinary that it is called a normal distribution. Extreme events, like 60 wins in a row, are so unlikely that we can ignore them. Collecting data every day, we would still be really surprised to see 60 straight wins even once in the life of the universe. But not all probability distributions rise so sharply from the plains. Some have shallow foot-hills — the so called long-tail distributions. Extreme events are much more likely with these distributions. One of the reasons for recent financial disasters is that the financial hackers using computers to place bets on the stock market thought they had normal distributions when in fact they had long-tail distributions. Oops.

So, after that long digression through bookmaking, can we see the answer to our first problem? If you were in a London coffee-house in 1660, how would you work out what to charge for marine insurance? The problem is quite similar. Obviously, you would need to collect and count data on merchant ships. You would need to keep count of ships coming and going, and to find out what happened to the ones you expected that didn’t turn up. You would subscribe to a shipping newsletter, or perhaps even pay someone to hang around the Pool of London and keep notes. You could then start to analyse the data more closely. Maybe there’s a difference in losses between different kinds of ships, different destinations, or different cargoes? Can you come up with some probability distributions? Based on your data you can make a plausible estimate of the odds that a particular customer’s ship won’t return.

This is just straightforward book-keeping, but when we calculate odds in more interesting cases, we need to use the rule invented by Thomas Bayes in the eighteenth century. This is usually explained using a complicated-looking mathematical formula, but you can understand and apply Bayes’ rule perfectly well without knowing that formula. You only need the formula when you are dealing with very complex situations. In that case you will also need a computer to help you do the calculations. Until then, the method I explain here is entirely adequate and gives accurate results.

To demonstrate the method, let’s look at a question posed by psychologist Gerd Gigerenzer. He was trying to see how well medical professionals calculate odds when interpreting the results of medical tests. This is everyday work for these professionals, so when Gigerenzer posed the following question to 160 gynaecologists it was a surprise that they mostly chose the wrong answer. Here’s Gigerenzer’s question. See how you do. note 57

Assume you conduct breast cancer screening using mammography in a certain region. You know the following information about the women in this region:

The probability that a woman has breast cancer is 1% (prevalence).
If a woman has breast cancer, the probability that she tests positive is 90% (sensitivity).
If a woman does not have breast cancer, the probability that she nevertheless tests positive is 9% (false-positive rate).

A woman tests positive. She wants to know from you whether this means she has breast cancer for sure, or what the chances are. What is the best answer?

The probability that she has breast cancer is about 81%
Out of 10 women with a positive mammogram, about 9 have breast cancer
Out of 10 women with a positive mammogram, about 1 has breast cancer
The probability that she has breast cancer is about 1%

Which answer did you pick? (The right answer is ‘C.’) Before Gigerenzer taught his technique to the gynaecologists, they were disturbingly bad at choosing the right answer: only 1 out of 5 of them chose correctly. This is worse than if they had picked their answer at random!

To get the right answer, we first need to understand what the question is telling us. The question gives us the results from some previous surveys. There must have been a survey which counted women in the region, and also counted how many of them had breast cancer. That survey tells us the prevalence of breast cancer in this region: the figure is 1% — in other words 1 woman in 100 has breast cancer. We might equivalently say that the prior probability of having cancer is 1 in 100, or that the base-rate for breast cancer is 1 in 100. All these terms mean the same thing: if you pick one woman out of this region at random, the chance that she currently has breast cancer, not knowing anything more about her, is 1 in 100.

There must have also been another survey looking at breast cancer screening using mammography, to find out how good that test is at finding whether a particular woman actually has cancer. It’s not completely reliable. The results of that survey tell us the sensitivity of the test: when women who definitely have cancer are tested, 90% of them test positive — 90 women out of 100. We are not told explicitly, but we are expected to notice for ourselves that this means the test can give the wrong result: that 10 women out of 100 with cancer will nevertheless test negative. (The test will give them the all-clear when when in fact they have cancer.) That figure — 10 in 100, or 10% — is also called the false-negative rate.

Finally, and presumably from the same survey, we have our last piece of data, the false-positive rate. This is the chance that the test makes the other kind of mistake, telling healthy women that they have cancer. When women who definitely do not have cancer are tested, nevertheless 9% of them test positive — 9 women out of 100. For these women, the test will disturbingly say that they have breast cancer when actually they don’t. Of course the test usually gives the all-clear for these women: 91 in 100 women without cancer will correctly test negative, just not all 100 of them.

So, if you are a gynaecologist presented with this data about your patient, what do you tell her? Does she have breast cancer for sure? Or what are her chances? Here is the easy way to work out the correct odds. First, don’t think directly about odds. Instead, think about a large enough number of people. This might be 1000, it might be 10,000 or more. If your calculations give you nice round numbers of people, you picked a big enough number. If you get annoying fractions of people, choose a bigger group to start with. In this case, it turns out that 1000 women is a large enough group. So, start with this invented group of 1000 typical women, and rephrase the data you were given as smaller sub-groups :

Out of 1000 women, 10 have cancer. (The prevalence is 1% and 1% of 1000 women is 10 women.)
Out of those 10 women with cancer, 9 test positive. (The sensitivity is 90% and 90% of 10 women is 9 women.)
Out of the 990 women without cancer, about 89 nevertheless test positive. (The false-positive rate is 9% and 9% of 990 women is about 89 women.)

It might help you to draw a diagram, showing how the original 1000 women get divided into sub-groups.

1000
women

10		990
breast cancer		no breast cancer

9	1	89	901
positive	negative	positive	negative

Now, you are not interested in the women who tested negative — your patient tested positive. Count the women who tested positive: we can see that there are 9 + 89 = 98 in total who test positive. Now count the women who tested positive and actually have cancer: of the 98 women who tested positive, only 9 actually have cancer. The rest are false-positives. So as a gynaecologist advising your patient, you should tell her that the chance she has cancer is now 9 in 98 or about 1 in 10.

That’s answer ‘C’ in Gigerenzer’s question. Before Gigerenzer taught the gynaecologists how to do this calculation, nearly half of them thought that the best answer was ‘B’ (9 out of 10). Most of the rest wrongly chose answer ‘A’ (88%).

A woman who tests positive in this example is still quite unlikely to actually have cancer, but a positive result means that it’s a good idea to investigate further, since this risk is likely to be the biggest one facing her right now. Notice that for women who test negative, their odds change in the opposite direction. They go from 1 in 100 before the test to 1 in 902 after the test. They still have a very slim chance of having cancer, but we would be happy to tell them that the test has “proved” that they don’t have cancer. Not proved in the old-fashioned sense of course, not in a way that would really satisfy Hobbes or Popper, but proved enough that we can make a decision.

We can use exactly the same techniques to update the odds on the truth of general rules using the evidence from experiments. When we design our experiments, we try to make them as clear as possible, to reduce the chance of both kinds of error. We try to make make the false-positive rate as small as possible, close to 0%. We also try to make the false-negative rate as small as possible, close to 0%. (Or equivalently, try to make the sensitivity as big as possible, close to 100%.) We can never do quite as well as this, but provided the sensitivity is bigger than the false-positive rate, any confirmatory experiment must make the general rule more likely.

One really clear experiment can move the odds of a particular general rule from very unlikely to very likely. But the same change in odds can also happen if we apply the results of a long sequence of less clear experiments one after the other. Either way, this is how in Bayesian reasoning a theory becomes more likely and more true through a series of confirmatory experiments.

Bayesian reasoning has been demonstrated to give commonsense answers and has been proved consistent with classical logical arguments when we are dealing with completely true or false statements. So why was there hostility from mainstream science? The biggest problem was probably that in the twentieth century, scientists had adopted a different way of deciding whether experimental results should be accepted. Unfortunately the way they chose was wrong.

Early in the twentieth century, scientists decided that they would say that an experimental result vindicated a theory provided the false-positive rate, which they called the p-value, was 5% or less. They called this a significant result. They called a p-value of 1% highly significant. This special meaning of the word “significant” persists throughout the whole of science. You will find it almost impossible to get the results of an experiment published unless you demonstrate a p-value of less than 5%, regardless of the sensitivity of your experiment. On the other hand, you will find it relatively straightforward to get your results published if your p-value is less than 5%, again regardless of the sensitivity. Worse than this, most scientists and most of the educated public think that a p-value of 5% means that there is a 5% chance that the theory tested by the experiment is wrong. Or to put it another way, they think that after the experiment the probability of the theory being true is now 95%!

I expect that you find this slightly horrifying. Times are changing, and Bayesian reasoning has moved from a fringe belief to mainstream respectability. But journals still use the 5% rule to decide whether to publish new results. Medical researcher John Ioannidis published a scathing article in 2005 with the provocative title Why Most Published Research Findings Are False. He pointed out the flaws I just mentioned, but also explained another problem with the 5% rule: when a lot of research groups attempt the same experiment, some of them are bound to get results that appear to confirm their theory, just by accident, when the theory is false. note 58

Negative results are unpopular and hard to publish. So, the many research groups with negative results just file them away and forget them. (This form of survivor bias is called the file-drawer effect.) The one research group who quite by chance gets a positive result finds it easy to publish this. For a little while after that publication, there is a window of opportunity for another group to publish an oh-no-it-isn’t paper — this is the only time a negative result is interesting. So much for Popper’s idea of falsifiability. note 59

Clearly, to really judge how much a theory is true we need to use Bayesian reasoning properly, and to use all the experimental results, not just the results the journals like to publish. Drug companies are particularly criticised for funding many studies, but only publishing the ones that support sales of their products. Bayesian reasoning can’t reach the truth if we don’t give it all the data. All it reveals then is our own biased preconceptions. (Ioannidis suggests that published papers in medicine currently tell us much more about the popularity of theories than about their truth.)

I expect that the situation will improve due to the operation of the social mechanisms of science — it has already improved a lot. The network of trust will gradually ensure that scientists who make reliable judgements will be believed more than flaky scientists, and the more reliable methods will become standard. Bayesian reasoning will eventually become the orthodoxy. But until then we have to take care when interpreting scientific results, because they might not mean what people say they mean.

The fundamental strength of science comes from the network of trust. The important thing when inventing a new hypothesis is not the imagination you have yourself, but the imagination you can make in a group of people. The important thing when judging truth is not the truth you reach on your own, but the truth you establish as a group. When we see this, we can see that science is actually the distilled essence of Western civilisation. Historian Carroll Quigley, writing in his book The Evolution of Civilizations, summed up this essence with the somewhat cryptic maxim that that “Truth unfolds in time through a communal process.” Quigley explains this in the following way: note 60

This outlook assumes, first, that there is a truth or goal for man’s activity. Thus it rejects despair, solipsism, scepticism, pessimism and chaos. It implies hope, order, and the existence of a meaningful objective external reality. And it provides the basis for science, religion, and social action as the West has known these.

Second, this attitude assumes that no one, now, has the truth in any complete or even adequate way; it must be sought or struggled for. Thus this outlook rejects smugness, complacency, pride, and personal authority in favour of the Christian values and a kind of basic agnosticism (with the implication “We don’t know everything”), as well as the idea of achievement of good through struggle to reach the good. The earliest great work of German literature, Parzival, has as its subtitle “The Brave Man Slowly Wise.” This is typical of the Western ideology’s belief that wisdom (or any real achievement) comes as a consequence of personal effort in time. The same idea is to be found in Dante’s Divine Comedy, in Shakespeare’s tragedies (taken as a whole), and in Beethoven’s symphonies.

There are two important ideas here: one is that no one has the whole truth now but that it can be approached closer and closer in the future, by vigorous effort, and the other is that no single individual does this or achieves this, but that it must be achieved by a communal effort, by a kind of cooperation in competition in which each individual’s efforts help to correct the errors of others and thus help the development of a consensus that is closer to the truth than the actions of any single individual ever could be. We might call these two aspects the temporal and the social. They are covered in our maxim by the words “unfolds” and “social.”

There is also a third idea here; namely, that the resulting consensus is still not final, although far superior to any earlier or more individual version.

Quigley wrote these words, which seem to perfectly describe the nature of science, in 1961. It makes you wonder why the “science wars” were ever necessary.

< Chapter 4. Tricks

Contents

Chapter 6. Patterns >