[ Outline ]

CHAPTER 3

POPPER AND THE OBJECTIVITY OF SCIENTIFIC PROBLEMS

[--- Unable To Translate Graphic ---]




3.1 Popper's Conception of Knowledge
Our interview of Popper's theory of science as problem-solving (and a comparison of it with Dewey's account) immediately raises a series of questions:
a) Is Popper correct in stressing the overriding importance of the objective features of problem-situations? What does he mean by saying problems are in World-3? Is the analogy he draws between the problem-solving of Einstein and that of an amoba ( or even a species!) a useful one?
b) What sense can we make out of Popper's claim that we can't understand a theory until we know what problems it was designed to solve? What does he mean by saying scientific progress can be measured by the depth of the problems under study?
c) When Popper provided his demarcation between science and non-science it was in terms of characteristics of theories. But if we are going to center our account of science around problems, shouldn't we also look for distinguishing characteristics of scientific problems? What exactly is it that makes a problem cogent, deep, interesting, ripe for solution? To reiterate the central problem of this book, how can problems (as opposed to theories) be evaluated?
In this chapter we will look more critically at Popper's claims about the objectivity of problems. We will then look in more detail at his theory of science as problem-solving. Our aim is to see hos much of Popper's account we want to preserve and to discover what needs to be drastically revised or supplemented.
As an introduction to the issue of the objectivity of problems, let us begin with what Popper understands by objective knowledge.
3.1 Popper's Conception of Scientific Knowledge
According to the Encyclopedia of Philosophy (Vol. IV, p. 345), the most widely accepted definition of knowledge is "justified true belief". However, on Popper's view none of these terms describe important characteristics of knowledge. First of all, knowledge propositions need not be believed by anyone. Suppose the world expert on widgets writes a definitive summary for an encyclopedia but then dies and no one ever reads her article; nevertheless, the information therein remains knowledge although no one actively believes it. Or, to cite one of Popper's examples, consider a table of logarithms generated by a computer program. It may turn out that certain items of the table are never read by any human. Yet they are part of mathematical knowledge.
So for Popper the genus of knowledge is objective propositional content, not subjective attitudes towards propositions. Knowledge may be encoded in human brains, but it may also be contained in books, diagrams, or software. Whether any one actively believes the proposition is irrelevant to its status as knowledge. (As my old chemistry teacher used to tell us, "If you want to be a good chemist, don't make a handbook out of your head!")
Knowledge items for Popper need not be believed, but mustn't they at least be worthy of rational belief? Shouldn't they be true and justifiable? Here again Popper departs from the traditional conception. Knowledge claims need not be true, Popper argues and he cites two sort s of examples. First of all, many of the most important propositions in science are literally false - e.g., all of the laws of classical physics and chemistry. Matter is not conserved, planets do not travel in perfect elipses, atoms are not indivisible, and not all molecules of water are alike. Neither are the claims of the theories which replaced classical science, such as Relativity
Theory and Quantum Mechanics, without flaw. If we were to follow the traditional epistemological delimitation of knowledge to true propositions, it might well turn out that there is no scientific knowledge. (Cf. Cartwright, How the Laws of Physics Lie) Popper goes on to argue that many of the most basic bits of common-sense knowledge are also false. The sun does not rise every twenty-four hours, at least not in the Land of the Midnight Sun and bread can be poisonous, if it is made from ergotic wheat.
But if we admit the existence of false knowledge claims shouldn't we at least require that in order to count as knowledge in a particular historical context a statement must have been well-confirmed by the evidence available at that time? Shouldn't we insist that one should be justified in believing knowledge propositions, even if later evidence may cause us to reverse our appraisal of their truth value?
Some of the details of Popper's criticisms of various justificationist epistemologies will emerge below. Suffice it to say here that with respect to the logical positivists Popper argues that there is no infallible empirical base - the most trivial sounding observation report, such as "Here is a glass of water", contains so many untested implications, such as, "If you were to drop it, the water would spill while the glass would break" or "If you were to cool it, the water would turn into ice," that it is impossible to verify them all. Furthermore, even if we were to take some set of observation reports as indubitable evidence, Popper follows Hume in arguing that they can never provide the kind of logical justificatory support assumed by inductivist philosophers.
But if we give up these traditional epistemological requirements, how are we to distinguish those propositions which are part of knowledge from random sentences generated by the apocryphal monkey at the typewriter, or, more realistically, from the outputs of programs such as Rachter? Is Stove (Popper and After) right in claiming that Popper has so changed the meaning of knowledge by completely denying its sense of

cognitive success and achievement that it is misleading for him to continue to use it? What, according to Popper, are the delimiters of knowledge?
I know of no place where Popper gives a short, italicized definition of knowledge but we can construct one from his writings. First of all if we are prepared to identify knowledge with our best science, he does give an explicit minimal characterization of a scientific claim - it is one which can be subjected to empirical test - and our best scientific claims are falsifiable propositions which are not false as far as we know but which have successfully passed severe empirical testing. Such a tri-partite definition would replace the old justified-true-belief account by replacing each of its components with corroborated-falsifiable-claim.
However, Popper often uses knowledge in a looser sense when he speaks of 'background knowledge' and its role in setting problems or appraising the severity of a test. These propositions are assumed to be unproblematic in a particular problem-situation (C. & R, p. 238) but there is no suggestion that they each have been carefully tested. In fact to demand systematic testing would lead to a falsificationist regress almost as vicious as the inductivist one. Neither must critical scrutiny be limited to empirical testing. Unlike the positivists, Popper would never consider excluding mathematics or philosophy from knowledge.
The proper conclusion to draw, I think, is that Popper does not try to delimit knowledge propositions because on his account it makes no sense to do so. No propositions receive permanent gold stars on Popper's account. All claims are conjectural although some have been more carefully scrutinized than others. In some contexts a proposition will be temporarily accepted; in others it may be challenged. What we can do at any point is to open the case on any claim and review its test record, its logical compatibility with other statements, its explanatory power, what problems it helps solve, what problems it generates, etc. But it would be fruitless to demand that we always start by reviewing the credentials of every proposition in sight.

3.2 The Objectivity of Scientific Problems
If problems arise within a knowledge context and if knowledge is the content of some set of propositions (not human attitudes towards them), then it would seem fairly straight-forward to define problems as some sort of logical inadequacy within the propositional set, such as inconsistency or incompleteness.
Thus although Popper himself often uses vague, quasi-psychologistic talk of "violated expectations" (reminiscent of Dewey), one could replace it by analyses in terms of logical inconsistencies between theoretical systems and singular observation statements. Of course, logic alone doesn't tell us which statements count as observations nor which theories are "accepted" (No problem arises if a theory already considered to be false leads to a prediction failure.), but presumeably Popper hopes to give non-psychologistic accounts of these additional factors.
By giving an objective account of problems, one can explain why there is often considerable inter-subjective agreement among scientists on what is problematic about a particular theory. (There is something really "out there" which makes them feel puzzled.) And one can clarify Dewey's point about neurotic feelings of doubt not being adequate for inquiry - the doubt must arise from a situation which is objectively open or indeterminate.
For Popper, although theories are generated by humans, theories can in turn autonomously generate problems of which no human being is aware. For example, when people first invented the integers (perhaps using a Brouwer-like construction), the problem of whether there is a largest prime also came into existence (although people only worried about it much later). Whether a problem is solved or not is also determined by looking at the objective state of the arguments which could be constructed for and against a proposed solution. Whether people actually accept those arguments or cease feeling puzzled is not relevant.
To dramatize the difference between the objective status of a problem and our feelings about it, Popper places them in different worlds. Roughly speaking, World-1 contains the objects traditionally studied by natural science (e.g., electrons and chairs). World-2 contains psychological states (e.g., feelings of doubt, beliefs). And World-3 contains "the objective contents of thought" (e.g., theories, arguments, problems). Although minds in W-2 (contained in W-1 bodies) produce everything in W-3, no mind working either individually or collectively can be subjectively aware of everything in W-3. (We can't think about each integer nor actually derive every Euclidean Theorem.) Popper claims the growth of knowledge is best understood by concentrating on the logical relations between W-3 objects, not on the psychological states of the people who produce and manipulate them. But although Popper's notion of objective problem is very useful, I think we cannot ignore the psychological and sociological aspects of scientific problems. I will develop this point by listing various points at which we must refer to these other dimensions of the problem-situation in order to understand the growth of knowledge.
Let us suppose that humans have generated propositions T and O - both now reside in W-3. Let us further suppose that T entails ~O, but the deduction is abstruse and so no one has noticed. Objectively, there is now a problem in W-3, but no inquiry will result until someone comes to believe that T entails ~O.
At this point, Popper might wish to add the explicit proposition, T _ ~O, to W-3 and draw a distinction between potential logical consequences of our theories and those which have actually been drawn. This seems fair enough. The derivation of ~O from T constitutes a powerful argument against T and creating new arguments surely is a contribution to knowledge. (Recall Galileo's wonderful criticism of the Aristotelian law of falling bodies - if two cannonballs, while falling, should somehow be tied together to form a mass twice
as heavy, according to Aristotle's theory, they should speed up, but this is absurd.) However, this does mean that only those features of W-3 which people notice can contribute to the growth of knowledge.
Furthermore, if people believe T and O are inconsistent (even though they aren't), they will try to solve a problem although it would seem that objectively no problem exists! For example, an early criticism of Darwinian theory (D) was that it could not explain the evolution of altruistic behavior (A). We now know that the argument was incorrect - sociobiologists describe a number of mechanisms, such as kin selection, which resolve the puzzle.
Here is a case where Darwinian theory and Altruism were in W-3 and people also believed D _ ~A so that was also in W-3. However, ~(D _ ~A) follows from D so presumably that was also at least potentially in W-3. Yet people's inquiry was influenced by what was in some sense a "psuedo-problem" - or at least a problem based on a mistake. Nevertheless, the inquiry resulted in new knowledge, the discovery of kin selection!
I conclude that logical inconsistencies in W-3 are of no importance in the growth of knowledge until people notice them. And even "neurotic" problems (puzzlement based on inconsistencies which aren't really there!) can be important for the growth of knowledge. So it seems that the subjective and social awareness of problems is important. However, at any given time, we are probably aware of an enormous number of problems. Not only are there the problems arising from known inconsistencies (Lakatos claimed that every theory lives in a "sea" of anomalies), there are also the explanatory problems which arise from gaps in our knowledge. Which problem will actually provoke inquiry? When it came to the issue of problem of selection, Popper admits that subjective experience plays a part in which problem we emphasize, or select as important (OK, pp. 166-67). Later in this essay we will explore the possibility of finding objective ways of evaluating problems.




3.3 Popper's Strong Analogy Between Evolutionary Biology and Epistemology

The account of Popper's theory of the objectivity of knowledge and problems given above is extracted from the essays collected together in Objective Knowledge but it would be misleading as it stands because Popper develops this view in a context which stresses the similarities between the growth of knowledge in the scientific community, animal learning, and the evolution of biological species. There is at least a metaphorical sense in which each three of these processes involve problem-solving, trial solutions, and error elimination, but Popper proposes that we take the analogy very seriously indeed. Thus his examples of the problem of violated expectations include the case of a newborn foal who sucks on the hair under the mare's front legs and is disappointed until it finds its way to the back, as well as the case of Newtonian astronomers who did not expect the results to the Eddington eclipse expedition which detected light bending in a gravitational field. Popper is not suggesting that foals have propositional attitudes. Rather it has an inborn "theory" which has "run into difficulties. He also draws parallels between the problem-solving activities of Einstein and an amoeba -- he cites the example of a hungry amoeba who learns to swim towards a light in order to get food. And he includes birds' nests among the problem solutions which reside in World-3!
Of course, there are other places where Popper stresses the unique features of science (more of this later) but by separating knowledge from human consciousness it is very easy for Popper to posit that knowledge is encoded in genotypes and guppies as well as in geniuses!
Let us now look in more detail at the parallels Popper draws between biological evolution, animal learning, and scientific inquiry, three processes which instantiate his general schema:
P --> TS --> EE --> P'
P stands for problem, which is to be understood in an objective sense and does not imply that the entity which "has" the problem is conscious of it. In the biological domain, species face problems connected with survival and reproductive success, such as the problems of escaping predators, raising young, finding food, mates, etc.
TS stands for tentative solution, such as the information encoded within a new genotype within the species' pool. EE, or error elimination occurs if the phenotype bearing the new genotype dies without reproducing (or reproduces at a lower rate than its con-specifies).
The outcome of reiterations of the TS and EE steps is a species better adapted to its environment, but new problems (P1) will typically lead to a repetition of the whole selection process.
When the schema is applied to animal learning it looks fairly similar to Skinnerian operant conditioning. [See Skinner's "Selection by Consequences"] In response to the "problem" posed by hunger pangs, or whatever, the animal engages in exploratory behavior (thus proposing a tentative "solution"). Unsuccessful solutions lead to no food, or even pain, and are extinguished. When a new behavior is successful, however, it becomes part of the individual animal's patterned response to that type of problem-situation. New problems then lead to more learning.
The application of the schema to scientific inquiry is quite straight-forward. Scientists propose falsifiable conjectures in response to problems arising within their knowledge situation, which are then subjected to empirical test. False hypotheses are thereby eliminated and the scientist is then free to confront new cognitive problems.
------------------------
Insert Fig. 3.1 about here
------------------------

These three instantiations of the schema are summarized in Figure 3.1. Let us now comment on some of the important dissimilarities between scientific inquiry and these other selection processes. One crucial difference, as Popper notes, is that in science "our mistaken theories die in our stead". If a species fails to solve a survival problem, it goes extinct. If a rat fails to find a path through a maze, it goes hungry. In both cases, the consequences of error (or success for that matter) have a direct physical effect on organisms. The scientist, on the other hand, may experience elation or disappointment as a result of empirical testing but these psychological reactions are not simply coupled within the selection process. For example, I may derive satisfaction from designing a clever test which refutes a hypothesis even when the hypothesis was of my own creation. And to the extent to which science is a "friendly hostile" competition between ideas (to use Popper's phrasing), there could even be a division of labor between creation and criticism such that every prediction failure is a personal triumph!
Because the success or failure of a scientific hypothesis can be decoupled from pleasure and pain, the scientist is free both to propose bold conjectural solutions and to test them severely. And the fact that as scientists we are free to choose problems - they are not forced on us by the environment - also allows us to operate less cautiously. Although the evaluation of scientific theories depends crucially on feedback from the environment, human scientists experience relatively little feedback from prediction successes or failures. Contrast the situation of the technologist, e.g., a potter who is trying to solve the problem of how to prevent pots from exploding in the kiln. Here the problem is set by practical considerations. Tentative solutions should be economically viable and may be of such limited scope as to apply only to the local clay and kilns. And it would be absurd to push any solution which appears to work to extremes. Prediction failures cost time and money and so the potter will theorize conservatively. Thus, although the potter, unlike the animal, can articulate the hypotheses under test and use information in books to criticize them, the potter's situation is more like the animal's than the scientist's because she or he is directly rewarded or punished according to the success of the tentative solution.
The scientist's relative freedom from personal repercussions sounds wonderful and liberating, but it can also pose the following problem. Note that on the biological or animal level it is not possible for the organism to "ignore" refutations because it is causally connected to the environment. A dogmatic potter may engage in a process of psychological denial of the pot shards from exploding pots but will soon go out of business. But an individual scientist may evade the elimination of erroneous theories by using ad hoc modifications or conventionalist strategems with impunity. To do so is like cheating as Solitaire - it may not be as much fun, but nothing keeps you from doing it - except one's internalized standards of fair play. An interesting question, then, is how scientific institutions and traditions can best reward (or punish!) scientists' activities as they engage in scientific inquiry. (For example, how do we discourage people from publishing non-reproducible experimental results while encouraging them to produce interesting detailed conjectures which may well be falsified?)
Many evolutionary epistemologists have been captivated by the formal resemblances between the modification of species by natural selection, the modification of behavior through differential reinforcement, and the modification of scientific systems through hypothesis testing. For Popper the parallels are especially easy to draw because he down-plays the importance of conscious beliefs in science. However, analogies can lead us astray as well as illuminate and although there may be a definite sense in which genotypes have propositional content, I think it can hardly be helpful to say birds' nests do -- unless we are also to say that sand dunes are the winds' solution to the problem of where to deposit suspended particles of soil and diamonds are the solution of carbon's problem of how best to solidify under extreme pressure!
As we now look more closely at Popper's account of scientific problem solving we will note other mischevious features of his taking the analogy too seriously.
3.4 Popper's Theory of Science as Problem-Solving
Philosophers of science today might admit that to be complete any account of scientific inquiry should say something about scientific problems but nevertheless resist the idea of putting problems at the very center of the enterprise. Let us now look in detail at Popper's methodology and see what he says about problems at each juncture. We will use an an outline the flowchart in Figure 3.2.
----------------------------
Insert Figure 3.2 about here
----------------------------

Throughout the discussion I will sometimes supplement Popper's examples with my own, but they are intended to be ones consonant with his scheme.
a. Typical Scientific Problems
As we have seen, according to Popper, no inquiry begins in a vacuum. Regardless of what the topic may be, the scientist, like all of us, begins with a motley collection of ideas, some clear, some confused, some true, some false. Puzzlement arises when there are inconsistencies or gaps within existing bodies of knowledge. But how are scientific problems different from those of ordinary life? Or are they different? Let us begin by surveying the typical kinds of scientific problems which Popper discusses and then we will comment on their special characteristics.

(i) Problems arising from violated expectations. A common sort of scientific problem arises when something surprising or unexpected occurs and we wonder how or why it happened. An important problem for early astronomers was the following: In general, celestial bodies, such as the sun, moon and stars, move across the sky in smooth arcs. However, it was discovered that the planets wander around the sky irregularly. Can one describe precisely how the planets move and explain why they move differently from the other heavenly bodies? Plato called this the problem of the planets. Ptolemy, Copernicus, and Kepler each offered a different solution to it.
Here is another example of a scientific problem caused by violated expectations: In 2896 Becquerel found that a batch of photographic plates which had been carefully stored in black paper were fogged. According to the best scientific knowledge available at the time, only visible light or x-rays could expose photographic plates. What could have happened? Becquerel finally began to suspect that the fogging was caused by an unusual rock he had used as a paper weight. And it was thus that he discovered radioactivity. Later Madame Curie showed that the rock contained radium.

(ii) Problems arising from a quest for deep explanations. Even if the scientist is lucky enough to discover a generalization which seems to have no exceptions, he or she is still faced with a problem: What causes the regularity? Why do things happen just that way? For example, early astronomers asked why the sun rose every day in the east. Some said it was because the sun moved in a circle around the earth. Later this geocentric theory was replaced with a heliocentric theory. In either case, a further question arose: What caused the sun (or earth) to move? According to Aristotle, there was a Prime Mover. Later people suggested a law of circular inertia, saying a wheel would move forever if there were no friction. Newton explained the regular motion in terms of linear inertia and the force of gravity.
There are many other cases in which the problem is to explain a regularity. Bohr wondered why the wavelengths of the spectral lines of hydrogen should fit the simple mathematical formula discovered by Balmer. Mendeleev and other chemists of the late 19th century wondered why the elements should arrange themselves so nicely into a Periodic Table. By the end of the 18th century, after the work of Boyle and Charles, everyone knew that gases expanded on heating. But why? Caloric theorists said that heat was a fluid which flowed into gases and as a result they took up more room. Kinetic theorists said heat was kinetic energy and hot gases expanded because their molecules moved faster. Both sides agreed on the regularity to be explained, but they offered competing explanations of it.
(iii) Problems arising from a quest for unity. As a science develops, a new sort of problem often arises: Can one find a unified theory which covers two or more domains which have previously been treated separately? For example, for a long time organic chemistry (which deals primarily with covalent compounds) and inorganic chemistry (which is mainly concerned with ionic compounds) were considered to be quite distinct fields. At this time people believed that naturally occurring organic compounds, such as urea, could not be synthesized in the laboratory because they contained a vital life force. However, today's theories of chemical bonding apply equally well to inorganic and organic materials.
Before Galileo, it was held that terrestrial bodies and celestial bodies obeyed different laws. Galileo (and later Newton) gave a unified account of the motions of all bodies. A pressing problem in physics today is the search for a unified field theory--a theory which would successfully combine relatively theory and quantum mechanics. Psychologists are looking for a unified theory of learning. Behaviorists can account for some kinds of learning; cognitive psychology provides explanations for other types of learning. But one would like to find a single theory which covers all instances of learning.

(iv) Problems of conflict between theories. Often, problems of finding a unifying explanation are exacerbated because of inconsistencies between the component theories. And contradictions can also arise between theories which appear to cover quite different domains. For example, the biggest objection to Copernicus' astronomical theory was its conflict with Aristotelian physics, according to which nothing could continue to move without a mover. And a strong contemporary objection to Darwin's theory of biological evolution was Kelvin's geophysical calculation of the age of the earth. (It turned out later that Kelvin's thermal estimates were wrong because they did not include the heat generated by radioactive decay.)
Each of the four types of scientific problems discussed above arises out of a rich background of information and expectations. New scientific theories are invented when scientists are faced with a problem: Why did my old theory or set of unconscious expectations fail? What causes this regularity which I have observed? Can I unify these two branches of science? Or resolve the inconsistencies between them?
None of these problem types are unique to science. Myth-makers are also looking for deep explanations and try to give unified pictures of the world we live in. Everyday life produces many calls for explanations, often of singular events. And many of our practical problems of existence arise because the common-sense generalizations we make about the world, including other people, are violated.
But although there is no sharp demarcation of scientific problems, there are some obvious differences in degree. In a well developed scientific field, problems arise within a body of knowledge which is generally more extensive, more detailed, and better systematized than that of other domains. (This is not always the case - both folk mythologies or craft technical lore may be of comparable sophistication.) Furthermore, the scientific tradition for the most part actively rewards people who expose contradictions or gaps within the body of science. Folklore and religious systems, by contrast, are often embedded within conservative institutions which discourage criticism or revision of the traditional beliefs. To summarize, to the extent to which scientific knowledge is well-articulated it is relatively easy to discover flaws in it, and scientific traditions encourage us to take these problems seriously.

b. Scientific Problem Solutions
We have described various sorts of problems which trigger scientific inquiry. Our next task is to characterize the sorts of problem solutions which count as scientific. This is the core of the demarcation problem with which Popper began.
However, let me digress a moment to point out that we have skipped over the process by which these tentative solutions are dreamt up in the first place and the problem of whether there is a logic of discovery. Early philosophers were optimistic about the prospects of describing a method for discovering true theories. Bacon and other inductivists thought that through careful observation and systematic use of his tables one could easily arrive at the solution to scientific problems. Descartes and other rationalists thought that a systematic analysis of our clear and distinct ideas would provide the answers.
Popper argues that there is no recipe for discovery, but from this he concludes that all the scientist can do is guess at the answer. Some conjectures will be "happy guesses" as Whewell described them; others will turn out to be dead wrong. It's all a matter of trial and error. In biological evolutions mutations occur by chance--we can't predict what new variations will occur. But natural selection will filter out those who are not adapted to the environment. Likewise for science. People make up all sorts of crazy hypotheses. But tests will weed out those which do not match reality. Quality control is insured by careful testing procedures, not by censorship of new ideas. The pattern of reasoning which leads to a new hypothesis is not important--it may be based on dreams, mystical experiences, weak analogies or what have you. According to Popper, the origins of the idea are irrelevant; what is crucial is how well the scientist's hunch stands up to testing.
Today both cognitive scientists and philosophers of science are optimistic about being able to describe the structure of the process Popper calls "trial and error". Here is a place where he is ill-served by the analogy to biology although ironically biologists have now given a reduced role to blind mutations.
Sociologists of course would argue that the origins of ideas are relevant -- a hypothesis which originates in Utah will have less initial plausibility than one which comes from MIT. And cognitive scientists, as well as philosophers such as Campbell and Hesse, dispute the claim that analogies only play a role in discovery and are then discarded.
And it is interesting to recall that Popper himself claims that one can't understand theories without knowing about the problems which they solve. Might this be construed as meaning that the problem-situation out of which the theory arose is relevant to its evaluation? But let us return to Popper's order of exposition.
As our account so far makes clear, the solutions to problems which scientists propose start out being mere hypotheses or conjectures. When they are first proposed, we have no particular reason to believe them true. Furthermore, these hypotheses tend to be rather bold and far-reaching. This is because the typical scientific problems we listed above all require as solutions theories of high content. Consider Problem Type 1: To explain why our expectations are violated, we need a theory which accounts both for the exceptions and the normal states of affairs we had expected. For example, a good answer to the problem of the planets' irregular motions would also explain the sun's regular motion.
To turn to Problem Type 2: Trying to give a deep explanation of a regularity (such as the Balmer formula for hydrogen spectral lines) generally results in a conjecture which has many other consequences as well (such as a formula for the spectral lines of sodium). As for Problem Type 3, it is clear that a unified theory will have more content than either of the separate fields. And generally such a theory will have lots of new consequences as well. (For example, the unified theory of chemical bonding covered not only traditional organic and inorganic compounds, but a whole new domain of organic-metallic compounds, such as hemoglobin.)
Although they are bold conjectures, Popper argues that conjectures do have one very important property in their favor: they can be tested by means of experiments. If one of our conjectures is false, it is realistic to hope that we will eventually discover its erroneous nature.
Let us now discuss the precise requirements that a theory must satisfy in order to be falsifiable.

(i) The Logical Requirement. Statements of the form "Some A's are B's" cannot be refuted by any report involving a finite number of instances, but universal generalizations, be they affirmative or negative, can be.
A necessary condition for a theory to be falsifiable is that it be logically possible to contradict it by a finite conjunction of sentences which describe particular instances.
Popper used the logical requirement to argue for the unfalsifiable status of many Marxist doctrines. Statements about the "inevitability" of the downfall of capitalism fail the logical requirement if no time limit is given. "Light has a maximum velocity" also fails unless a value is specified.
Many claims which at first appear to be universal generalizations also fail. For example, "Every metal has a melting point" or "every action is rational" may be better analyzed as what Watkins called "all-some" statements, i.e. as saying that for every metal there is some temperature above which it will melt, and for every action, there is some description of the agent's problem situation such that the action was appropriate to it.
On the other hand, the claim "some copper is brittle" looks like it is not open to refutation by a finite observation report; however, if it is accompanied by a recipe, "To make copper brittle, place a thin sheet of it for three days in a nuclear reactor where the neutron flux is..." it becomes testable.

(ii) The Empirical Requirement. Having the proper logical form is not sufficient to insure that a hypothesis is scientifically testable. "All repressions are seated in the libido" satisfies the logical requirement but, as it stands, it is not subject to experimental test. How exactly are we to recognize a repression And even if we could, how could we tell whether or not it is seated in the libido?
Contrast the following sentence which has the same logical form: "All samples of iron have a melting point less then 2000_ C." This universal generalization is subject to test. We can easily determine whether a sample is iron or not through chemical analysis. (We might use the potassium thiocyanate test, for example.) And there are also a variety of reliable procedures for measuring melting points.
The contrast in the above two cases suggests the following requirement: A falsifiable theory is one which is inconsistent with at least one finite conjunction of observation test reports. Popper's discussion of test reports, or 'basic' statements, as he called them in the Logic of Scientific Discovery, is traditional in many respects: they describe observable events occurring in an individual region of space and time (p. 103); they are inter-subjectively testable, i.e. they describe experimental arrangements in such a way that anyone who has learned the relevant technique can check on their validity (p. 99).
But Popper departs from the logical positivist or other standard empiricist accounts by not claiming that the 'basic' statements are infallible, nor are they picked out by any psychological criteria. The store of 'basic' statements and hence whether or not a theory is testable depends on the technology and state of scientific development available at the time. Before the invention of the mass spectrograph, "All atoms of an element have the same weight" would not have been considered testable because as yet there was no way to determine the weights of individual atoms. What counts as an observation sentence also changes with the development of instrumentation and with new theoretical developments. For modern scientists, "This sample is oxygen" and "This is an electron track" are considered to be observation statements. In an earlier era they would not have been. "This sample is a gas which supports combustion" and "This track is a cloud chamber curves towards the positive plate" might have been used instead, if the identity of the gas or of the particle was still in question. The truth of observation statements cannot be decided with certainty; even so, members of the scientific community can tentatively agree in their judgments about the truth of observation statements.
Although Popper originally proposed his falsifiability doctrine as a demarcation between science and pseudo-science, one could also view it as a regulative principle to guide the development of good scientific theories, not as a sharp criterion. We can increase the degree of falsifiability of a conjecture by increasing the domain of phenomena to which it applies, by making more precise the descriptive claims about the domain, and by inventing less and less controversial observational procedures for evaluating those claims. More important then the question of whether Freud's theory has any potential falsifiers whatsoever is the question of how we might increase its degree of falsifiability, either by making its claims more precise or by using detection methods such as plethysmography for detecting patterns of sexual arousal instead of relying solely on dreams or other traditional psychoanalytic techniques.
I have just recited the standard Popperian answer to the demarcation problem which is described in his intellectual autobiography (Unended Quest) and in Chapter 1 of Conjectures and Refutations as the problem which his falsificationist theory of science was intended to solve.
But let us now ask how this account might differ if we take seriously Popper's own claim that theories should be solutions to problems? On this perspective some of the criteria for appraisal would be different. For example, before checking on the falsifiability of a theory, shouldn't we first see if it is even a solution of the problem? Popper discusses the Maori conjecture that the earth is held up by a turtle and criticizes it, not because it is false or unfalsifiable, but because it immediately raises the same problem which it was supposed to solve, namely what holds up the earth ( or turtle)?
This example strongly suggests that before (or in addition to) appraising a conjecture in terms of its falsifiability we should check on whether it solves "the" problem. This brings the historical context of the conjecture and perhaps even the intentions of its inventor into the evaluation of a hypothesis. It also suggests that a Freud or whatever might not be castigated so severely for proposing unfalsifiable conjectures if they were at least solutions to his problem, particularly if no other more falsifiable solution was available. Perhaps we should instead fault his choice of problem, not his theory. We will need to return to this case when we present our account of problem evaluation.

c. The Choice of Scientific Tests
In his account of the empirical appraisal of scientific theories, Popper once again inverts the positivists' rhetoric. Rather than trying to collect data which will confirm our conjectures, we should instead conduct those tests which seem most likely to refute them.

Popper's central point is nicely illustrated by an anecdote recounted by Francis Bacon:
...it was a good answer that was made by one who, when they showed him hanging in a temple a picture of those who had paid their vows as having escaped shipwreck, and would have him say whether he did not now acknowledge the power of the gods--"Aye," asked he again, "but where are they painted that were drowned after their vows?" And such is the way of all superstition...(The New Organon, BK I, Aphorism LXVI.)

It is obvious that Bacon is criticizing the way data is being used to argue for the "power of the gods." But we need to spell out the objection in detail.
First of all, what exactly is the claim about the power of the gods which is under discussion? It would appear that the basic thesis which can be directly tested is the following: "If one makes a vow during a storm at sea, then one will survive." We can abbreviate the conjecture as: "If V, then S."* The proposed method for collecting data which will either support or refute the conjecture is as follows: Go to churches and record instances of people who paid their vows as thanks for having escaped drowning. Using our abbreviations, we can describe the instances so collected as cases of V and S.
At first glance, it may appear that these data do indeed tend to confirm the conjecture because they are positive instances of the generalization. But let us look more carefully. What kind of evidence would refute the conjecture? The answer is a case of someone who made a solemn vow, but drowned at sea nevertheless, i.e., a case of V and not-S. But given our method of collecting data, it is logically impossible that we would ever find such a refuting instance. By looking only at pictures of survivors (i.e., unless it is logically possible that there could have been another cases of S) we will never come across an instance of V and not-S, even if there be millions of such cases. One of the basic principles of scientific testing can be stated roughly as follows: The outcome of a certain test procedure cannot confirm a theory outcome which would have disconfirmed the theory.
In order to test "If V, then S", we should sample the domain of V and find out whether any of them drowned. As Bacon says, "Where are they painted that were drowned after their vows?" In addition, we should also look at examples of people who in fact drowned and find out if any of them had made vows. (This might be difficult to do in practice, but we could check their diaries, ask their mates, etc.) It is useless to look at cases already known to be S or not-V. Such "tests" are irrelevant to the conjecture under consideration because it is logically impossible that they could ever yield a refuting case.
[--- Unable To Translate Graphic ---]

*This is probably somewhat over simplified. The proponents of the power-or-the-gods theory may have only wished to defend a weaker claim: "If one prays, one is less likely to be drowned." We will postpone the discussion of the testing of probabilistic generalizations until later.
We might label the procedure described by Bacon as "no--risk data collecting" because the way in which the data is collected makes it logically impossible for a refutation to appear. Once pointed out, the methodological error is blatant; nevertheless it can be seductive. For example, after teaching scientific method for a number of years, I once caught myself reasoning as follows: I observed that all of my close friends who blinked a lot and tipped their heads back when looking at me wore contact lenses. I then started investigating other people who behaved similarly and sure enough I nearly always found independent evidence that they were wearing contacts. Sometimes I asked them. Other times I would see a lens holder in their purse or bathroom, etc. I soon jumped to the following conclusion: "All people who wear contact lenses blink a lot and peer down their noses when they look at you."
This conclusion was obviously too strong, given that I had done only an informal study on a very small sample. But I did think that my experience justified a more modest statement: "All contact lens wearer whom I have met blink a lot, etc." What was not clear to me for quite some time is that none of my observations had served as a test for either conjecture. For I had always begun my observations with people who blinked! Given this choice of sample domain, I could have investigated all the blinkers and peerers in the world and never found a counter-example to my conjecture--not because there weren't any, but simply because it was logically impossible for my method of data collection to uncover them.
Popper adds to Bacon's point by stressing that good scientific tests should be severe ones, that is they should be deliberately designed, using our general background knowledge to probe the conjecture at its weakest point, i.e., to find a refutation if one does in fact exist. For example, when Kohlberg put forward a theory about the development of moral reasoning in children, he was well advised to test it on children from Turkey and Taiwan. We might expect a theory developed on the basis of experience with kids in Boston to fail when applied to children from quite different cultures and religions. (As it turned out, the Kohlberg theory passed this severe test.) Similarly, theories about the universality of the Oedipal complex should be tested on aborigines, and theories about language learning on deaf and blind children. Theories about geological change and biological evolution should be tested, where possible, by data from other planets. Physicists know that theories often fail under conditions of high energy or high velocity; and often processes at the micro level violate generalizations which work well with medium-sized objects. For this reason physicists want to build ever bigger accelerators for smaller and smaller particles.
The general procedure for designing a severe test is as follows: The hypothesis under test always makes a series of claims. For example, the claim "All arsenic compounds are poisonous" says that both soluble and insoluble arsenic compounds are poisonous. It also says that both yellow and green non-poisonous substances are free of arsenic. (Don't forget the contrapositive!) According to our background information, some of these claims sound less plausible than others. For example, since we know that many poisons have to be digested in order to act, we may decide that insoluble arsenic compounds are less likely to be poisonous than soluble ones. A severe test is one which tests the least plausible claims of a theory. In our example, given our background theories about the relationship between solubility and poisonous character, we should start testing by looking at insoluble arsenic compounds. If the conjecture passes this severe test, we will then look at the class of soluble arsenic compounds. Other things being equal, severe tests, i.e., tests of the least plausible claims of a conjecture, are more stringent than less severe ones.
Note that our appraisal of the severity of tests depends on the background information available at the time. Consider the two claims: (a) "All yellow non-poisonous substances are free of arsenic" and (b) "All green non-poisonous substances are free of arsenic." Which domain should be investigated first if one wishes to perform a severe test of the original conjecture? Recall that counter-example to the original conjecture would be a non-poisonous arsenic compound. So if we think green substances are more likely to contain arsenic than yellow ones, we should sample the domain of non-poisonous green substances. If we know nothing about the typical color of arsenic compounds, however, or if we have reason to believe that color is not correlated to chemical composition, we would judge the tests to be equally severe. (As a matter of fact, many arsenic materials are yellow or black, so there may be a slight preference for a test of yellow non-poisonous substances.)
Because they depend on vague and incomplete background knowledge, judgments about which tests are most likely to refute the conjecture are unusually fallible. For example, the Kohlberg theory of the development of moral reasoning worked surprising well when tested on boys raised in Muslim and Confucian cultures, but failed when tested on young American girls. (See Gilligan.) Kohlberg had thought his universal theory might well be sensitive to differences in the religious ethos, but that factor turned out to be much less important than gender differences.
A special case of severe testing is what Bacon called a "crucial experiment." Here one probes the vulnerability of a hypothesis by comparing its predictions with those of a plausible rival conjecture. If hypothesis A predicts P and rival hypothesis B predicts not-P, checking on whether P or not-P is the case will allow us immediately to eliminate one alternative. Contrary to what its name may imply, a crucial experiment does not prove the truth of the undefeated hypothesis because there may exist more alternatives which we have not yet thought of.
For example, according to the Copernican theory, Venus should wax and wane like the moon. The Ptolemaic system, on the other hand, predicted that Venus should not exhibit extremely different phases at different times. This conflict between the predictions of the rival cosmological systems was noted by Copernicus in 1543. However, it was not possible to conduct a crucial experiment without a telescope. In 1610, Galileo observed that Venus did have phases and so the Ptolemaic system was refuted. This crucial experiment in no way established the truth of the Copernican heliocentric theory for in 1588 Tycho Brahe had proposed a geocentric system which also gave the correct predictions concerning Venus. The next order of business was to design a crucial experiment between the Tychonic and Copernican system.
Crucial tests are only stringent when the rival hypothesis is a fairly plausible one (as judged against background knowledge). The more plausible the rival conjecture to the hypothesis in question, the more stringent is a crucial test between them. For example, no one would have thought it necessary to design a crucial test if the only rival were an ad hoc hypothesis to the effect that Venus shone by its own light but periodically varied its luminous area from crescent shaped to circular!
Checking on the truth of the least plausible consequences of a conjecture is the most efficient way of trying to falsify it, and hence Popper recommends tests with samples which are in a sense biased against the conjecture! How can this be reconciled with the standard statistical practices of using random samples or stratified samples? Or can it be? To develop a full-fledged critique of the Popperian approach to statistics is beyond the scope of this book, but I will make a few preliminary remarks. First of all, many statistical studies are not really tests at all, but simply demographic measurements. If Kinsey wishes to make descriptive claims about overall American sexual practices, clearly a non-biased sample is desirable. However, if one is testing the claim that the half-life of radium is always 1600 years or that the M/F ratio of neonates is always 0.51 (regardless of conditions), then it makes sense to focus our inquiry on samples of radium or births in extraordinary circumstances, namely those which on our background knowledge are most likely to violate the general claim.
In the case of evaluating causal claims by means of controlled tests, the Popperian approach once more exhorts us to put most effort into controlling for those factors which are most likely to be alternatives to the causes described by our hypothesis. Of course, since our background hunches about the weaknesses of our conjectures are always fallible, our assessments of the severity of a test are also fallible and this is a good reason for eventually performing a wide variety of tests whether they appear to be severe or not.
There have been a variety of reactions to Popper's account of severe testing. Bayesians have analyzed parallels between Popper's account and their own. Proponents of the semantic view of theories, on the other hand, sometimes imply we should invert Popper's methodology and gradually increase the domain of a theoretical model by first trying to apply it to the instances most similar to the paradigm cases around which the model was originally constructed.
What new perspectives on scientific testing are provided if we view theories as solutions to problems? Let's begin with a non-scientific example adapted from van Fraassen (whose views we will discuss later). Suppose we wish to test the claim C: Eve ate the apple from the tree of knowledge.
Now imagine two problem situations. In the first case, theologians are puzzling over the exact symbolism of the apple treel Did it stand for eternal life or did it have something to do with the knowledge of good and evil? C proposes an answer.
In the second case, let us suppose that the controversy is over whether Eve also ate the apple or whether she merely tempted Adam to eat while remaining pure herself.
Now we can well imagine that the sorts of historical and textual testing of C which would be appropriate in the two problem situations would be quite different. The theologians would look primarily at evidence relating to the tree issue and might not even care whether it was Adam or Eve or both who ate the apple. In the second problem situation the relevance of the tests would be reversed.
I conclude that at least in some cases, knowing which problem the theory was supposed to solve would influence our choice of tests. Since scientific theories have lots of content (and hence lots of places to go wrong) and since most of our theories are probably literally false, it makes sense to focus our testing on the aspects which are most relevant to the problem we are trying to solve. Criticism of the non-relevant parts (such as "Eve didn't actually eat the apple -- she just bit into and chewed it up but didn't swallow it because just then God came and chased them out") may strike us as pedantic.
Knowing the problem-situation seems to help us choose relevant tests in the case of the idiographic inquiry where the conjectures are singular statements. But what about in the case to law-like hypotheses? Do we really need to know what the question is in order to test the truth of the answer?
I grant that in the case of fundamental scientific theories the influence of problem on testing may be less, but I still think it may be as important as Popperian severity which is based on improbability. Here is an illustrative example -- consider the following conjecture:
C: The atomic weight of oxygen is sixteen.
Now the most severe test we can think of is to make measurements accurate to six figures. (It is highly improbable that this value is exactly right.) And if the issue is the existence of isotopes that would be quite appropriate. But what if the problem-situation is an earlier one in which the main dispute is whether oxygen gas is diatomic? Then accuracy to six significant figures is not relevant at all.
Perhaps this point is better expressed by saying that before testing one should clarify or amplify the conjecture. But then this process will also require us to go back to the problem for which it is intended to be a solution.

d. The Ambiguity of Falsification
We have raised questions about the choice of tests to be performed, but as described so far, the logic of testing is simple and clear-cut: (1) We derive a prediction from our conjecture which can be subjected to experimental check. (2) We do the experiment. (3) If the prediction is wrong, the theory is refuted. Period. Or so it would seem. In the typical scientific case, however, the situation is more complicated and the decision as to exactly which premise is to be given up is less straightforward.
Let us illustrate the dilemma with a famous scientific example, the case of stellar parallax. After Copernicus put forward his theory that the earth revolved around the sun, astronomers noted that if his theory were true, one should be able to detect stellar parallax. If one is moving with respect to an object, then the direction in which the object appears changes. This phenomenon is known as parallax. As a race driver moves past the pit stop, at first it is ahead of him/her. Later it is behind. The angle a in the diagram below is called the angle of parallax. A similar diagram could be used to illustrate Copernicus' theory of the earth's annual movement with respect to a particular star.
----------------------------
Insert Figure 3.3 about here
----------------------------

But when 17th-century observers looked for stellar parallax, they couldn't detect any. Didn't this mean the theory was false? The supporters of Copernicus' theory decided to blame an auxiliary assumption instead. Their argument can be illustrated with the race-car analogy. Suppose the driver sights on a distant radio tower instead of on the pit stop. Now the angle of parallax may become too small to be easily noticeable. As the radio of D to R increases, a gets smaller. At very large valued of D it will become to small to detect. According to estimates of the distance between the earth and the stars available at the time, stellar parallax should have been observable. But the Copernicans argued that these estimates were wrong and claimed that the universe was about 1,000 times bigger than had previously been imagined. This bold move turned out to be correct, but 200 years passed before stellar parallax was detected experimentally.
----------------------------
Insert Figure 3.4 about here
----------------------------

The logic of the testing situation was as follows:
Copernican theory: The earth revolves around the sun, which is stationary relative to the stars.
Auxiliary hypothesis: The distance between the earth and the stars is about 20,000 earth radii.
Experimental Prediction: (Therefore) Stellar parallax should be easily observable with the apparatus available.
Experimental Finding: No stellar parallax is observable with the available apparatus.

Since the prediction failed, one of the premises had to be wrong. Copernicus blamed the auxiliary hypothesis; anti-Copernicans defended it and blamed the theory instead. With no good way at the time to test the auxiliary hypothesis, the status of the Copernican theory was left open.
The philosopher who first stressed that almost all tests involve a lot of auxiliary assumptions was Pierre Duhem, an early 20th-century philosopher, physicist, and historian of science. Hence, we will call the following the Duhemian problem:
When an experimental prediction turns out to be false, should the scientist blame the theory under test or the auxiliary assumptions (or both)?
Popper emphasizes that there is no methodological recipe for dealing with the Duhemian problem, but a few guidelines can be laid down. First of all, one should not use the Duhemian problem as a general excuse for one's pet theory. It is not good methodology to say, "My theory's prediction failed? Well, not to worry. I probably made a false auxiliary assumption somewhere along the line." If one wants to keep the theory despite the prediction failure, one must point to a specific auxiliary assumption and then design tests of that auxiliary assumption. If the auxiliary assumption passes the tests, then we should conclude that our theory and not the auxiliary was false. Sometimes, however, it is not possible nor practical to test auxiliary hypotheses. (We saw an example of this in the Copernican case.) In such instances, we can draw no firm conclusions about the original test situation. If a theory in conjunction with a variety of auxiliary assumptions makes a lot of false experimental predictions, though, we tend to decide that the theory is false, even though we can't conclusively test each auxiliary.
The Duhemian dilemma can be analyzed as follows:
The theory under test (T) when conjoined with one or more auxiliary hypotheses (A) makes a prediction (p). Experiments show that p is not the case. By modus tollens we know that either T or A (or both) must be false, but logic doesn't tell us which.
(T & A) - p
~p
(Therefore) ~T, or ~A, or ~T & ~A

Note that in the pure Duhemian problem situation there is no controversy about the experimental result, ~p. Furthermore, all parties agree that T & A imply p. The disagreement arises about whether to revise A or to revise T. Of course, there are also cases in which people cannot agree on experimental results or on what exactly the implications of the theory are. These latter disagreements can usually be settled either through further experimentation or by means of logical analysis. The Duhemian problem is often more recalcitrant. Popper does give one firm piece of methodological advise. No matter which premise we decide to replace the substitute should never be lower in empirical content.
The main responses to Popper's remarks on the Duhemian dilemma, such as those of Kuhn and Lakatos, point out that in the history of science, it is fairly rare to find a case where a theory is refuted by a single, decisive experiment. More often theories come to be rejected through a variety of prediction failures. Theories are rarely struck down by a blow from one type of crucial experiment, no matter how many times that experiment is repeated. Rather they are eroded away be an accumulation of anomalous results. We will develop this important critique in the next chapter. Here I will only remark that if we view theories as problem solutions, then as we may modify our system in response to the Duhemian dilemma we should either insure that the new system also answers the original problem(s) or else explicitly acknowledge that we are abandoning them.
e. The Status of Corroborated Theories
We have discussed what happens when our theory's prediction is refuted--either we revise it or adjust an auxiliary hypothesis. What happens if our theory passes the most severe experimental tests we can devise with flying colors? Can we then declare it proven true, or at least highly probable? It is perhaps on this issue that Popper's disagreement with the positivists is deepest.
First of all the history of science strongly suggests that we should never feel completely certain about any scientific generalization, no matter how frequently or stringently it has been tested. Newton's theory of classical mechanics had perhaps the best track record ever; yet it was superceded by Einstein's relativistic mechanics. Here are a few other examples of well-established claims which eventually had to be corrected or rejected:
(i) Matter cannot be created or destroyed. (Not true in nuclear fission or fusion processes.)
(ii) The sun rises once every twenty-four hours. (Not true at the North Pole.)
(iii) All molecules of water are made of the same stuff. (Not true for heavy water, deuterium oxide.)
(iv) The major difference between homo sapiens and the lower animals is that man can use language. (Not true for chimpanzees which can use sign language.)
(v) Living matter can only come from living matter; it cannot be formed from inanimate substances. (Not true--amino acids can be synthesized from ammonia, methane, hydrogen, etc.)
So the history of science warns us that any scientific claim is fallible. Logic and philosophy of science can help us understand why this is so. Here are some of the reasons:
(i) Generalizations cover a potential infinity of cases. But we can only check on a finite number of predictions. We can never be sure that the next case won't violate the rule (e.g., a black swan may turn up in Australia).
(ii) Scientific theories make infinitely precise claims. But we can only make measurements of finite accuracy. (For example, Newton's law of gravitation says the force of gravity varies inversely with the square of the distance, i.e., the exponent is r2.00000...but our measurements cannot discriminate between r2 and r2.0000000001.)
(iii) Many of our scientific laws only hold under idealized conditions--to give two very simple examples, the law of the lever assumed no friction at the fulcrum, and the law of the pendulum assumes there is no air resistance. Of course, we can try to minimize such interferences when we conduct tests, e.g. but resting our lever on a point or setting up a pendulum in a vacuum, but our experiments never achieve the perfect conditions which are assumed in our ideal laws.
(iv) There may be alternative theories which we have not even dreamt of yet which account for all of the data we have in hand.
For all these reasons, theories are underdetermined by our observational results and can never be proved through any amount of observation and experiments. There are no rules for deciding when to accept a theory (for the time being) and move on to new problems, but what we can do is to answer each of the above sources of fallibility as best we can.
(i) By testing in widely scattered domains, we guard ourselves against parochialism, e.g., the black swans in Australia.
(ii) By making our tests as precise and ideal as possible, we can approach the infinite precision and perfection of our theories.
(iii) And the best way to rule out alternative explanations is to deliberately try to imagine radically different ways of explaining our results. If we can devise a new alternative, we can then set up a crucial experiment between the two competing accounts.
But what is the exact epistemological status of theories which have survived critical scrutiny? What positive claims can we make about them? Popper introduced the term corroboration to describe the severity of the tests passed by a hypothesis, but he emphatically denies that the degree of corroboration is to be interpreted as a degree of reasonable belief in the hypothesis or the probability that it is true. However, he does say that for purposes of practical action, it is rational to base our behavior on our most highly corroborated theories. And for purposes of scientific inquiry we should use the degree of corroboration of various claims as guides to criticism and revision of our scientific systems. The Duhemian problem would become completely intractable if we had no way of at least tentatively assigning the blame for prediction failures. And the whole mechanism of falsification rests on the existence of 'basic' statements, i.e., statements which all observers can test and presumably corroborate for themselves.
Popper's theory of corroboration and his views of induction are perhaps the most controversial aspects of his philosophy and I will not comment on that far-ranging debate. I will only remark that to the extent that tests are chosen because of their relevance to the problem-situation, our estimates of corroboration or Bayesian confirmation or what have you will also be dependent on problems.
3.? Final Comments
Popper's characterization of the objective aspects of problems is a good starting point, but it needs to be accompanied by a fuller account of the factors, be they objective or subjective, which influence problem choice. If scientists tried to work on all the problems which exist in a World-3 sense, or chose their problems randomly, science as we know it would not exist.
Popper's methodology stresses problems as the starting point of inquiry but makes problems less central in the later stages of theory evaluation. A more thorough-going problems approach would lead us to modify Popper's account of preliminary theory appraisal and the prioritizing of scientific tests. It is less obvious how, if at all, viewing theories as solutions to problems should affect our philosophical accounts of theory corroboration or confirmation.
no scientific knowledge. (Cf. Cartwright, How the Laws of Physics Lie) Popper goes on to argue that many of the most basic bits of common-sense knowledge are also false. The sun does not rise every twenty-four hours, at least not in the Land of the Midnight Sun and bread can be poisonous, if it is made from ergotic wheat.
But if we admit the existence of false knowledge claims shouldn't we at least require that in order to count as knowledge in a particular historical context a statement must have been well-confirmed by the evidence available at that time? Shouldn't we insist that one should be justified in believing knowledge propositions, even if later evidence may cause us to reverse our appraisal of their truth value?
Some of the details of Popper's criticisms of various justificationist epistemologies will emerge below. Suffice it to say here that with respect to the logical positivists Popper argues that there is no infallible empirical base - the most trivial sounding observation report, such as "Here is a glass of water", contains so many untested implications, such as, "If you were to drop it, the water would spill while the glass would break" or "If you were to cool it, the water would turn into ice," that it is impossible to verify them all. Furthermore, even if we were to take some set of observation reports as indubitable evidence, Popper follows Hume in arguing that they can never provide the kind of logical justificatory support assumed by inductivist philosophers.
But if we give up these traditional epistemological requirements, how are we to distinguish those propositions which are part of knowledge from random sentences generated by the apocryphal monkey at the typewriter, or, more realistically, from the outputs of programs such as Rachter? Is Stove (Popper and After) right in claiming that Popper has so changed the meaning of knowledge by completely denying its sense of

cognitive success and achievement that it is misleading for him to continue to use it? What, according to Popper, are the delimiters of knowledge?
I know of no place where Popper gives a short, italicized definition of knowledge but we can construct one from his writings. First of all if we are prepared to identify knowledge with our best science, he does give an explicit minimal characterization of a scientific claim - it is one which can be subjected to empirical test - and our best scientific claims are falsifiable propositions which are not false as far as we know but which have successfully passed severe empirical testing. Such a tri-partite definition would replace the old justified-true-belief account by replacing each of its components with corroborated-falsifiable-claim.
However, Popper often uses knowledge in a looser sense when he speaks of 'background knowledge' and its role in setting problems or appraising the severity of a test. These propositions are assumed to be unproblematic in a particular problem-situation (C. & R, p. 238) but there is no suggestion that they each have been carefully tested. In fact to demand systematic testing would lead to a falsificationist regress almost as vicious as the inductivist one. Neither must critical scrutiny be limited to empirical testing. Unlike the positivists, Popper would never consider excluding mathematics or philosophy from knowledge.
The proper conclusion to draw, I think, is that Popper does not try to delimit knowledge propositions because on his account it makes no sense to do so. No propositions receive permanent gold stars on Popper's account. All claims are conjectural although some have been more carefully scrutinized than others. In some contexts a proposition will be temporarily accepted; in others it may be challenged. What we can do at any point is to open the case on any claim and review its test record, its logical compatibility with other statements, its explanatory power, what problems it helps solve, what problems it generates, etc. But it would be fruitless to demand that we always start by reviewing the credentials of every proposition in sight.

3.2 The Objectivity of Scientific Problems
If problems arise within a knowledge context and if knowledge is the content of some set of propositions (not human attitudes towards them), then it would seem fairly straight-forward to define problems as some sort of logical inadequacy within the propositional set, such as inconsistency or incompleteness.
Thus although Popper himself often uses vague, quasi-psychologistic talk of "violated expectations" (reminiscent of Dewey), one could replace it by analyses in terms of logical inconsistencies between theoretical systems and singular observation statements. Of course, logic alone doesn't tell us which statements count as observations nor which theories are "accepted" (No problem arises if a theory already considered to be false leads to a prediction failure.), but presumeably Popper hopes to give non-psychologistic accounts of these additional factors.
By giving an objective account of problems, one can explain why there is often considerable inter-subjective agreement among scientists on what is problematic about a particular theory. (There is something really "out there" which makes them feel puzzled.) And one can clarify Dewey's point about neurotic feelings of doubt not being adequate for inquiry - the doubt must arise from a situation which is objectively open or indeterminate.
For Popper, although theories are generated by humans, theories can in turn autonomously generate problems of which no human being is aware. For example, when people first invented the integers (perhaps using a Brouwer-like construction), the problem of whether there is a largest prime also came into existence (although people only worried about it much later). Whether a problem is solved or not is also determined by looking at the objective state of the arguments which could be constructed for and against a proposed solution. Whether people actually accept those arguments or cease feeling puzzled is not relevant.
To dramatize the difference between the objective status of a problem and our feelings about it, Popper places them in different worlds. Roughly speaking, World-1 contains the objects traditionally studied by natural science (e.g., electrons and chairs). World-2 contains psychological states (e.g., feelings of doubt, beliefs). And World-3 contains "the objective contents of thought" (e.g., theories, arguments, problems). Although minds in W-2 (contained in W-1 bodies) produce everything in W-3, no mind working either individually or collectively can be subjectively aware of everything in W-3. (We can't think about each integer nor actually derive every Euclidean Theorem.) Popper claims the growth of knowledge is best understood by concentrating on the logical relations between W-3 objects, not on the psychological states of the people who produce and manipulate them. But although Popper's notion of objective problem is very useful, I think we cannot ignore the psychological and sociological aspects of scientific problems. I will develop this point by listing various points at which we must refer to these other dimensions of the problem-situation in order to understand the growth of knowledge.
Let us suppose that humans have generated propositions T and O - both now reside in W-3. Let us further suppose that T entails ~O, but the deduction is abstruse and so no one has noticed. Objectively, there is now a problem in W-3, but no inquiry will result until someone comes to believe that T entails ~O.
At this point, Popper might wish to add the explicit proposition, T _ ~O, to W-3 and draw a distinction between potential logical consequences of our theories and those which have actually been drawn. This seems fair enough. The derivation of ~O from T constitutes a powerful argument against T and creating new arguments surely is a contribution to knowledge. (Recall Galileo's wonderful criticism of the Aristotelian law of falling bodies - if two cannonballs, while falling, should somehow be tied together to form a mass twice
as heavy, according to Aristotle's theory, they should speed up, but this is absurd.) However, this does mean that only those features of W-3 which people notice can contribute to the growth of knowledge.
Furthermore, if people believe T and O are inconsistent (even though they aren't), they will try to solve a problem although it would seem that objectively no problem exists! For example, an early criticism of Darwinian theory (D) was that it could not explain the evolution of altruistic behavior (A). We now know that the argument was incorrect - sociobiologists describe a number of mechanisms, such as kin selection, which resolve the puzzle.
Here is a case where Darwinian theory and Altruism were in W-3 and people also believed D _ ~A so that was also in W-3. However, ~(D _ ~A) follows from D so presumably that was also at least potentially in W-3. Yet people's inquiry was influenced by what was in some sense a "psuedo-problem" - or at least a problem based on a mistake. Nevertheless, the inquiry resulted in new knowledge, the discovery of kin selection!
I conclude that logical inconsistencies in W-3 are of no importance in the growth of knowledge until people notice them. And even "neurotic" problems (puzzlement based on inconsistencies which aren't really there!) can be important for the growth of knowledge. So it seems that the subjective and social awareness of problems is important. However, at any given time, we are probably aware of an enormous number of problems. Not only are there the problems arising from known inconsistencies (Lakatos claimed that every theory lives in a "sea" of anomalies), there are also the explanatory problems which arise from gaps in our knowledge. Which problem will actually provoke inquiry? When it came to the issue of problem of selection, Popper admits that subjective experience plays a part in which problem we emphasize, or select as important (OK, pp. 166-67). Later in this essay we will explore the possibility of finding objective ways of evaluating problems.




3.3 Popper's Strong Analogy Between Evolutionary Biology and Epistemology


The account of Popper's theory of the objectivity of knowledge and problems given above is extracted from the essays collected together in Objective Knowledge but it would be misleading as it stands because Popper develops this view in a context which stresses the similarities between the growth of knowledge in the scientific community, animal learning, and the evolution of biological species. There is at least a metaphorical sense in which each three of these processes involve problem-solving, trial solutions, and error elimination, but Popper proposes that we take the analogy very seriously indeed. Thus his examples of the problem of violated expectations include the case of a newborn foal who sucks on the hair under the mare's front legs and is disappointed until it finds its way to the back, as well as the case of Newtonian astronomers who did not expect the results to the Eddington eclipse expedition which detected light bending in a gravitational field. Popper is not suggesting that foals have propositional attitudes. Rather it has an inborn "theory" which has "run into difficulties. He also draws parallels between the problem-solving activities of Einstein and an amoeba -- he cites the example of a hungry amoeba who learns to swim towards a light in order to get food. And he includes birds' nests among the problem solutions which reside in World-3!
Of course, there are other places where Popper stresses the unique features of science (more of this later) but by separating knowledge from human consciousness it is very easy for Popper to posit that knowledge is encoded in genotypes and guppies as well as in geniuses!
Let us now look in more detail at the parallels Popper draws between biological evolution, animal learning, and scientific inquiry, three processes which instantiate his general schema:
P --> TS --> EE --> P'
P stands for problem, which is to be understood in an objective sense and does not imply that the entity which "has" the problem is conscious of it. In the biological domain, species face problems connected with survival and reproductive success, such as the problems of escaping predators, raising young, finding food, mates, etc.
TS stands for tentative solution, such as the information encoded within a new genotype within the species' pool. EE, or error elimination occurs if the phenotype bearing the new genotype dies without reproducing (or reproduces at a lower rate than its con-specifies).
The outcome of reiterations of the TS and EE steps is a species better adapted to its environment, but new problems (P1) will typically lead to a repetition of the whole selection process.
When the schema is applied to animal learning it looks fairly similar to Skinnerian operant conditioning. [See Skinner's "Selection by Consequences"] In response to the "problem" posed by hunger pangs, or whatever, the animal engages in exploratory behavior (thus proposing a tentative "solution"). Unsuccessful solutions lead to no food, or even pain, and are extinguished. When a new behavior is successful, however, it becomes part of the individual animal's patterned response to that type of problem-situation. New problems then lead to more learning.
The application of the schema to scientific inquiry is quite straight-forward. Scientists propose falsifiable conjectures in response to problems arising within their knowledge situation, which are then subjected to empirical test. False hypotheses are thereby eliminated and the scientist is then free to confront new cognitive problems.
------------------------
Insert Fig. 3.1 about here
------------------------

These three instantiations of the schema are summarized in Figure 3.1. Let us now comment on some of the important dissimilarities between scientific inquiry and these other selection processes. One crucial difference, as Popper notes, is that in science "our mistaken theories die in our stead". If a species fails to solve a survival problem, it goes extinct. If a rat fails to find a path through a maze, it goes hungry. In both cases, the consequences of error (or success for that matter) have a direct physical effect on organisms. The scientist, on the other hand, may experience elation or disappointment as a result of empirical testing but these psychological reactions are not simply coupled within the selection process. For example, I may derive satisfaction from designing a clever test which refutes a hypothesis even when the hypothesis was of my own creation. And to the extent to which science is a "friendly hostile" competition between ideas (to use Popper's phrasing), there could even be a division of labor between creation and criticism such that every prediction failure is a personal triumph!
Because the success or failure of a scientific hypothesis can be decoupled from pleasure and pain, the scientist is free both to propose bold conjectural solutions and to test them severely. And the fact that as scientists we are free to choose problems - they are not forced on us by the environment - also allows us to operate less cautiously. Although the evaluation of scientific theories depends crucially on feedback from the environment, human scientists experience relatively little feedback from prediction successes or failures. Contrast the situation of the technologist, e.g., a potter who is trying to solve the problem of how to prevent pots from exploding in the kiln. Here the problem is set by practical considerations. Tentative solutions should be economically viable and may be of such limited scope as to apply only to the local clay and kilns. And it would be absurd to push any solution which appears to work to extremes. Prediction failures cost time and money and so the potter will theorize conservatively. Thus, although the potter, unlike the animal, can articulate the hypotheses under test and use information in books to criticize them, the potter's situation is more like the animal's than the scientist's because she or he is directly rewarded or punished according to the success of the tentative solution.
The scientist's relative freedom from personal repercussions sounds wonderful and liberating, but it can also pose the following problem. Note that on the biological or animal level it is not possible for the organism to "ignore" refutations because it is causally connected to the environment. A dogmatic potter may engage in a process of psychological denial of the pot shards from exploding pots but will soon go out of business. But an individual scientist may evade the elimination of erroneous theories by using ad hoc modifications or conventionalist strategems with impunity. To do so is like cheating as Solitaire - it may not be as much fun, but nothing keeps you from doing it - except one's internalized standards of fair play. An interesting question, then, is how scientific institutions and traditions can best reward (or punish!) scientists' activities as they engage in scientific inquiry. (For example, how do we discourage people from publishing non-reproducible experimental results while encouraging them to produce interesting detailed conjectures which may well be falsified?)
Many evolutionary epistemologists have been captivated by the formal resemblances between the modification of species by natural selection, the modification of behavior through differential reinforcement, and the modification of scientific systems through hypothesis testing. For Popper the parallels are especially easy to draw because he down-plays the importance of conscious beliefs in science. However, analogies can lead us astray as well as illuminate and although there may be a definite sense in which genotypes have propositional content, I think it can hardly be helpful to say birds' nests do -- unless we are also to say that sand dunes are the winds' solution to the problem of where to deposit suspended particles of soil and diamonds are the solution of carbon's problem of how best to solidify under extreme pressure!
As we now look more closely at Popper's account of scientific problem solving we will note other mischevious features of his taking the analogy too seriously.
3.4 Popper's Theory of Science as Problem-Solving
Philosophers of science today might admit that to be complete any account of scientific inquiry should say something about scientific problems but nevertheless resist the idea of putting problems at the very center of the enterprise. Let us now look in detail at Popper's methodology and see what he says about problems at each juncture. We will use an an outline the flowchart in Figure 3.2.
----------------------------
Insert Figure 3.2 about here
----------------------------

Throughout the discussion I will sometimes supplement Popper's examples with my own, but they are intended to be ones consonant with his scheme.
a. Typical Scientific Problems
As we have seen, according to Popper, no inquiry begins in a vacuum. Regardless of what the topic may be, the scientist, like all of us, begins with a motley collection of ideas, some clear, some confused, some true, some false. Puzzlement arises when there are inconsistencies or gaps within existing bodies of knowledge. But how are scientific problems different from those of ordinary life? Or are they different? Let us begin by surveying the typical kinds of scientific problems which Popper discusses and then we will comment on their special characteristics.

(i) Problems arising from violated expectations. A common sort of scientific problem arises when something surprising or unexpected occurs and we wonder how or why it happened. An important problem for early astronomers was the following: In general, celestial bodies, such as the sun, moon and stars, move across the sky in smooth arcs. However, it was discovered that the planets wander around the sky irregularly. Can one describe precisely how the planets move and explain why they move differently from the other heavenly bodies? Plato called this the problem of the planets. Ptolemy, Copernicus, and Kepler each offered a different solution to it.
Here is another example of a scientific problem caused by violated expectations: In 2896 Becquerel found that a batch of photographic plates which had been carefully stored in black paper were fogged. According to the best scientific knowledge available at the time, only visible light or x-rays could expose photographic plates. What could have happened? Becquerel finally began to suspect that the fogging was caused by an unusual rock he had used as a paper weight. And it was thus that he discovered radioactivity. Later Madame Curie showed that the rock contained radium.

(ii) Problems arising from a quest for deep explanations. Even if the scientist is lucky enough to discover a generalization which seems to have no exceptions, he or she is still faced with a problem: What causes the regularity? Why do things happen just that way? For example, early astronomers asked why the sun rose every day in the east. Some said it was because the sun moved in a circle around the earth. Later this geocentric theory was replaced with a heliocentric theory. In either case, a further question arose: What caused the sun (or earth) to move? According to Aristotle, there was a Prime Mover. Later people suggested a law of circular inertia, saying a wheel would move forever if there were no friction. Newton explained the regular motion in terms of linear inertia and the force of gravity.
There are many other cases in which the problem is to explain a regularity. Bohr wondered why the wavelengths of the spectral lines of hydrogen should fit the simple mathematical formula discovered by Balmer. Mendeleev and other chemists of the late 19th century wondered why the elements should arrange themselves so nicely into a Periodic Table. By the end of the 18th century, after the work of Boyle and Charles, everyone knew that gases expanded on heating. But why? Caloric theorists said that heat was a fluid which flowed into gases and as a result they took up more room. Kinetic theorists said heat was kinetic energy and hot gases expanded because their molecules moved faster. Both sides agreed on the regularity to be explained, but they offered competing explanations of it.
(iii) Problems arising from a quest for unity. As a science develops, a new sort of problem often arises: Can one find a unified theory which covers two or more domains which have previously been treated separately? For example, for a long time organic chemistry (which deals primarily with covalent compounds) and inorganic chemistry (which is mainly concerned with ionic compounds) were considered to be quite distinct fields. At this time people believed that naturally occurring organic compounds, such as urea, could not be synthesized in the laboratory because they contained a vital life force. However, today's theories of chemical bonding apply equally well to inorganic and organic materials.
Before Galileo, it was held that terrestrial bodies and celestial bodies obeyed different laws. Galileo (and later Newton) gave a unified account of the motions of all bodies. A pressing problem in physics today is the search for a unified field theory--a theory which would successfully combine relatively theory and quantum mechanics. Psychologists are looking for a unified theory of learning. Behaviorists can account for some kinds of learning; cognitive psychology provides explanations for other types of learning. But one would like to find a single theory which covers all instances of learning.

(iv) Problems of conflict between theories. Often, problems of finding a unifying explanation are exacerbated because of inconsistencies between the component theories. And contradictions can also arise between theories which appear to cover quite different domains. For example, the biggest objection to Copernicus' astronomical theory was its conflict with Aristotelian physics, according to which nothing could continue to move without a mover. And a strong contemporary objection to Darwin's theory of biological evolution was Kelvin's geophysical calculation of the age of the earth. (It turned out later that Kelvin's thermal estimates were wrong because they did not include the heat generated by radioactive decay.)
Each of the four types of scientific problems discussed above arises out of a rich background of information and expectations. New scientific theories are invented when scientists are faced with a problem: Why did my old theory or set of unconscious expectations fail? What causes this regularity which I have observed? Can I unify these two branches of science? Or resolve the inconsistencies between them?
None of these problem types are unique to science. Myth-makers are also looking for deep explanations and try to give unified pictures of the world we live in. Everyday life produces many calls for explanations, often of singular events. And many of our practical problems of existence arise because the common-sense generalizations we make about the world, including other people, are violated.
But although there is no sharp demarcation of scientific problems, there are some obvious differences in degree. In a well developed scientific field, problems arise within a body of knowledge which is generally more extensive, more detailed, and better systematized than that of other domains. (This is not always the case - both folk mythologies or craft technical lore may be of comparable sophistication.) Furthermore, the scientific tradition for the most part actively rewards people who expose contradictions or gaps within the body of science. Folklore and religious systems, by contrast, are often embedded within conservative institutions which discourage criticism or revision of the traditional beliefs. To summarize, to the extent to which scientific knowledge is well-articulated it is relatively easy to discover flaws in it, and scientific traditions encourage us to take these problems seriously.

b. Scientific Problem Solutions
We have described various sorts of problems which trigger scientific inquiry. Our next task is to characterize the sorts of problem solutions which count as scientific. This is the core of the demarcation problem with which Popper began.
However, let me digress a moment to point out that we have skipped over the process by which these tentative solutions are dreamt up in the first place and the problem of whether there is a logic of discovery. Early philosophers were optimistic about the prospects of describing a method for discovering true theories. Bacon and other inductivists thought that through careful observation and systematic use of his tables one could easily arrive at the solution to scientific problems. Descartes and other rationalists thought that a systematic analysis of our clear and distinct ideas would provide the answers.
Popper argues that there is no recipe for discovery, but from this he concludes that all the scientist can do is guess at the answer. Some conjectures will be "happy guesses" as Whewell described them; others will turn out to be dead wrong. It's all a matter of trial and error. In biological evolutions mutations occur by chance--we can't predict what new variations will occur. But natural selection will filter out those who are not adapted to the environment. Likewise for science. People make up all sorts of crazy hypotheses. But tests will weed out those which do not match reality. Quality control is insured by careful testing procedures, not by censorship of new ideas. The pattern of reasoning which leads to a new hypothesis is not important--it may be based on dreams, mystical experiences, weak analogies or what have you. According to Popper, the origins of the idea are irrelevant; what is crucial is how well the scientist's hunch stands up to testing.
Today both cognitive scientists and philosophers of science are optimistic about being able to describe the structure of the process Popper calls "trial and error". Here is a place where he is ill-served by the analogy to biology although ironically biologists have now given a reduced role to blind mutations.
Sociologists of course would argue that the origins of ideas are relevant -- a hypothesis which originates in Utah will have less initial plausibility than one which comes from MIT. And cognitive scientists, as well as philosophers such as Campbell and Hesse, dispute the claim that analogies only play a role in discovery and are then discarded.
And it is interesting to recall that Popper himself claims that one can't understand theories without knowing about the problems which they solve. Might this be construed as meaning that the problem-situation out of which the theory arose is relevant to its evaluation? But let us return to Popper's order of exposition.
As our account so far makes clear, the solutions to problems which scientists propose start out being mere hypotheses or conjectures. When they are first proposed, we have no particular reason to believe them true. Furthermore, these hypotheses tend to be rather bold and far-reaching. This is because the typical scientific problems we listed above all require as solutions theories of high content. Consider Problem Type 1: To explain why our expectations are violated, we need a theory which accounts both for the exceptions and the normal states of affairs we had expected. For example, a good answer to the problem of the planets' irregular motions would also explain the sun's regular motion.
To turn to Problem Type 2: Trying to give a deep explanation of a regularity (such as the Balmer formula for hydrogen spectral lines) generally results in a conjecture which has many other consequences as well (such as a formula for the spectral lines of sodium). As for Problem Type 3, it is clear that a unified theory will have more content than either of the separate fields. And generally such a theory will have lots of new consequences as well. (For example, the unified theory of chemical bonding covered not only traditional organic and inorganic compounds, but a whole new domain of organic-metallic compounds, such as hemoglobin.)
Although they are bold conjectures, Popper argues that conjectures do have one very important property in their favor: they can be tested by means of experiments. If one of our conjectures is false, it is realistic to hope that we will eventually discover its erroneous nature.
Let us now discuss the precise requirements that a theory must satisfy in order to be falsifiable.

(i) The Logical Requirement. Statements of the form "Some A's are B's" cannot be refuted by any report involving a finite number of instances, but universal generalizations, be they affirmative or negative, can be.
A necessary condition for a theory to be falsifiable is that it be logically possible to contradict it by a finite conjunction of sentences which describe particular instances.
Popper used the logical requirement to argue for the unfalsifiable status of many Marxist doctrines. Statements about the "inevitability" of the downfall of capitalism fail the logical requirement if no time limit is given. "Light has a maximum velocity" also fails unless a value is specified.
Many claims which at first appear to be universal generalizations also fail. For example, "Every metal has a melting point" or "every action is rational" may be better analyzed as what Watkins called "all-some" statements, i.e. as saying that for every metal there is some temperature above which it will melt, and for every action, there is some description of the agent's problem situation such that the action was appropriate to it.
On the other hand, the claim "some copper is brittle" looks like it is not open to refutation by a finite observation report; however, if it is accompanied by a recipe, "To make copper brittle, place a thin sheet of it for three days in a nuclear reactor where the neutron flux is..." it becomes testable.

(ii) The Empirical Requirement. Having the proper logical form is not sufficient to insure that a hypothesis is scientifically testable. "All repressions are seated in the libido" satisfies the logical requirement but, as it stands, it is not subject to experimental test. How exactly are we to recognize a repression And even if we could, how could we tell whether or not it is seated in the libido?
Contrast the following sentence which has the same logical form: "All samples of iron have a melting point less then 2000_ C." This universal generalization is subject to test. We can easily determine whether a sample is iron or not through chemical analysis. (We might use the potassium thiocyanate test, for example.) And there are also a variety of reliable procedures for measuring melting points.
The contrast in the above two cases suggests the following requirement: A falsifiable theory is one which is inconsistent with at least one finite conjunction of observation test reports. Popper's discussion of test reports, or 'basic' statements, as he called them in the Logic of Scientific Discovery, is traditional in many respects: they describe observable events occurring in an individual region of space and time (p. 103); they are inter-subjectively testable, i.e. they describe experimental arrangements in such a way that anyone who has learned the relevant technique can check on their validity (p. 99).
But Popper departs from the logical positivist or other standard empiricist accounts by not claiming that the 'basic' statements are infallible, nor are they picked out by any psychological criteria. The store of 'basic' statements and hence whether or not a theory is testable depends on the technology and state of scientific development available at the time. Before the invention of the mass spectrograph, "All atoms of an element have the same weight" would not have been considered testable because as yet there was no way to determine the weights of individual atoms. What counts as an observation sentence also changes with the development of instrumentation and with new theoretical developments. For modern scientists, "This sample is oxygen" and "This is an electron track" are considered to be observation statements. In an earlier era they would not have been. "This sample is a gas which supports combustion" and "This track is a cloud chamber curves towards the positive plate" might have been used instead, if the identity of the gas or of the particle was still in question. The truth of observation statements cannot be decided with certainty; even so, members of the scientific community can tentatively agree in their judgments about the truth of observation statements.
Although Popper originally proposed his falsifiability doctrine as a demarcation between science and pseudo-science, one could also view it as a regulative principle to guide the development of good scientific theories, not as a sharp criterion. We can increase the degree of falsifiability of a conjecture by increasing the domain of phenomena to which it applies, by making more precise the descriptive claims about the domain, and by inventing less and less controversial observational procedures for evaluating those claims. More important then the question of whether Freud's theory has any potential falsifiers whatsoever is the question of how we might increase its degree of falsifiability, either by making its claims more precise or by using detection methods such as plethysmography for detecting patterns of sexual arousal instead of relying solely on dreams or other traditional psychoanalytic techniques.
I have just recited the standard Popperian answer to the demarcation problem which is described in his intellectual autobiography (Unended Quest) and in Chapter 1 of Conjectures and Refutations as the problem which his falsificationist theory of science was intended to solve.
But let us now ask how this account might differ if we take seriously Popper's own claim that theories should be solutions to problems? On this perspective some of the criteria for appraisal would be different. For example, before checking on the falsifiability of a theory, shouldn't we first see if it is even a solution of the problem? Popper discusses the Maori conjecture that the earth is held up by a turtle and criticizes it, not because it is false or unfalsifiable, but because it immediately raises the same problem which it was supposed to solve, namely what holds up the earth ( or turtle)?
This example strongly suggests that before (or in addition to) appraising a conjecture in terms of its falsifiability we should check on whether it solves "the" problem. This brings the historical context of the conjecture and perhaps even the intentions of its inventor into the evaluation of a hypothesis. It also suggests that a Freud or whatever might not be castigated so severely for proposing unfalsifiable conjectures if they were at least solutions to his problem, particularly if no other more falsifiable solution was available. Perhaps we should instead fault his choice of problem, not his theory. We will need to return to this case when we present our account of problem evaluation.

c. The Choice of Scientific Tests
In his account of the empirical appraisal of scientific theories, Popper once again inverts the positivists' rhetoric. Rather than trying to collect data which will confirm our conjectures, we should instead conduct those tests which seem most likely to refute them.

Popper's central point is nicely illustrated by an anecdote recounted by Francis Bacon:
...it was a good answer that was made by one who, when they showed him hanging in a temple a picture of those who had paid their vows as having escaped shipwreck, and would have him say whether he did not now acknowledge the power of the gods--"Aye," asked he again, "but where are they painted that were drowned after their vows?" And such is the way of all superstition...(The New Organon, BK I, Aphorism LXVI.)

It is obvious that Bacon is criticizing the way data is being used to argue for the "power of the gods." But we need to spell out the objection in detail.
First of all, what exactly is the claim about the power of the gods which is under discussion? It would appear that the basic thesis which can be directly tested is the following: "If one makes a vow during a storm at sea, then one will survive." We can abbreviate the conjecture as: "If V, then S."* The proposed method for collecting data which will either support or refute the conjecture is as follows: Go to churches and record instances of people who paid their vows as thanks for having escaped drowning. Using our abbreviations, we can describe the instances so collected as cases of V and S.
At first glance, it may appear that these data do indeed tend to confirm the conjecture because they are positive instances of the generalization. But let us look more carefully. What kind of evidence would refute the conjecture? The answer is a case of someone who made a solemn vow, but drowned at sea nevertheless, i.e., a case of V and not-S. But given our method of collecting data, it is logically impossible that we would ever find such a refuting instance. By looking only at pictures of survivors (i.e., unless it is logically possible that there could have been another cases of S) we will never come across an instance of V and not-S, even if there be millions of such cases. One of the basic principles of scientific testing can be stated roughly as follows: The outcome of a certain test procedure cannot confirm a theory outcome which would have disconfirmed the theory.
In order to test "If V, then S", we should sample the domain of V and find out whether any of them drowned. As Bacon says, "Where are they painted that were drowned after their vows?" In addition, we should also look at examples of people who in fact drowned and find out if any of them had made vows. (This might be difficult to do in practice, but we could check their diaries, ask their mates, etc.) It is useless to look at cases already known to be S or not-V. Such "tests" are irrelevant to the conjecture under consideration because it is logically impossible that they could ever yield a refuting case.
[--- Unable To Translate Graphic ---]

*This is probably somewhat over simplified. The proponents of the power-or-the-gods theory may have only wished to defend a weaker claim: "If one prays, one is less likely to be drowned." We will postpone the discussion of the testing of probabilistic generalizations until later.
We might label the procedure described by Bacon as "no--risk data collecting" because the way in which the data is collected makes it logically impossible for a refutation to appear. Once pointed out, the methodological error is blatant; nevertheless it can be seductive. For example, after teaching scientific method for a number of years, I once caught myself reasoning as follows: I observed that all of my close friends who blinked a lot and tipped their heads back when looking at me wore contact lenses. I then started investigating other people who behaved similarly and sure enough I nearly always found independent evidence that they were wearing contacts. Sometimes I asked them. Other times I would see a lens holder in their purse or bathroom, etc. I soon jumped to the following conclusion: "All people who wear contact lenses blink a lot and peer down their noses when they look at you."
This conclusion was obviously too strong, given that I had done only an informal study on a very small sample. But I did think that my experience justified a more modest statement: "All contact lens wearer whom I have met blink a lot, etc." What was not clear to me for quite some time is that none of my observations had served as a test for either conjecture. For I had always begun my observations with people who blinked! Given this choice of sample domain, I could have investigated all the blinkers and peerers in the world and never found a counter-example to my conjecture--not because there weren't any, but simply because it was logically impossible for my method of data collection to uncover them.
Popper adds to Bacon's point by stressing that good scientific tests should be severe ones, that is they should be deliberately designed, using our general background knowledge to probe the conjecture at its weakest point, i.e., to find a refutation if one does in fact exist. For example, when Kohlberg put forward a theory about the development of moral reasoning in children, he was well advised to test it on children from Turkey and Taiwan. We might expect a theory developed on the basis of experience with kids in Boston to fail when applied to children from quite different cultures and religions. (As it turned out, the Kohlberg theory passed this severe test.) Similarly, theories about the universality of the Oedipal complex should be tested on aborigines, and theories about language learning on deaf and blind children. Theories about geological change and biological evolution should be tested, where possible, by data from other planets. Physicists know that theories often fail under conditions of high energy or high velocity; and often processes at the micro level violate generalizations which work well with medium-sized objects. For this reason physicists want to build ever bigger accelerators for smaller and smaller particles.
The general procedure for designing a severe test is as follows: The hypothesis under test always makes a series of claims. For example, the claim "All arsenic compounds are poisonous" says that both soluble and insoluble arsenic compounds are poisonous. It also says that both yellow and green non-poisonous substances are free of arsenic. (Don't forget the contrapositive!) According to our background information, some of these claims sound less plausible than others. For example, since we know that many poisons have to be digested in order to act, we may decide that insoluble arsenic compounds are less likely to be poisonous than soluble ones. A severe test is one which tests the least plausible claims of a theory. In our example, given our background theories about the relationship between solubility and poisonous character, we should start testing by looking at insoluble arsenic compounds. If the conjecture passes this severe test, we will then look at the class of soluble arsenic compounds. Other things being equal, severe tests, i.e., tests of the least plausible claims of a conjecture, are more stringent than less severe ones.
Note that our appraisal of the severity of tests depends on the background information available at the time. Consider the two claims: (a) "All yellow non-poisonous substances are free of arsenic" and (b) "All green non-poisonous substances are free of arsenic." Which domain should be investigated first if one wishes to perform a severe test of the original conjecture? Recall that counter-example to the original conjecture would be a non-poisonous arsenic compound. So if we think green substances are more likely to contain arsenic than yellow ones, we should sample the domain of non-poisonous green substances. If we know nothing about the typical color of arsenic compounds, however, or if we have reason to believe that color is not correlated to chemical composition, we would judge the tests to be equally severe. (As a matter of fact, many arsenic materials are yellow or black, so there may be a slight preference for a test of yellow non-poisonous substances.)
Because they depend on vague and incomplete background knowledge, judgments about which tests are most likely to refute the conjecture are unusually fallible. For example, the Kohlberg theory of the development of moral reasoning worked surprising well when tested on boys raised in Muslim and Confucian cultures, but failed when tested on young American girls. (See Gilligan.) Kohlberg had thought his universal theory might well be sensitive to differences in the religious ethos, but that factor turned out to be much less important than gender differences.
A special case of severe testing is what Bacon called a "crucial experiment." Here one probes the vulnerability of a hypothesis by comparing its predictions with those of a plausible rival conjecture. If hypothesis A predicts P and rival hypothesis B predicts not-P, checking on whether P or not-P is the case will allow us immediately to eliminate one alternative. Contrary to what its name may imply, a crucial experiment does not prove the truth of the undefeated hypothesis because there may exist more alternatives which we have not yet thought of.
For example, according to the Copernican theory, Venus should wax and wane like the moon. The Ptolemaic system, on the other hand, predicted that Venus should not exhibit extremely different phases at different times. This conflict between the predictions of the rival cosmological systems was noted by Copernicus in 1543. However, it was not possible to conduct a crucial experiment without a telescope. In 1610, Galileo observed that Venus did have phases and so the Ptolemaic system was refuted. This crucial experiment in no way established the truth of the Copernican heliocentric theory for in 1588 Tycho Brahe had proposed a geocentric system which also gave the correct predictions concerning Venus. The next order of business was to design a crucial experiment between the Tychonic and Copernican system.
Crucial tests are only stringent when the rival hypothesis is a fairly plausible one (as judged against background knowledge). The more plausible the rival conjecture to the hypothesis in question, the more stringent is a crucial test between them. For example, no one would have thought it necessary to design a crucial test if the only rival were an ad hoc hypothesis to the effect that Venus shone by its own light but periodically varied its luminous area from crescent shaped to circular!
Checking on the truth of the least plausible consequences of a conjecture is the most efficient way of trying to falsify it, and hence Popper recommends tests with samples which are in a sense biased against the conjecture! How can this be reconciled with the standard statistical practices of using random samples or stratified samples? Or can it be? To develop a full-fledged critique of the Popperian approach to statistics is beyond the scope of this book, but I will make a few preliminary remarks. First of all, many statistical studies are not really tests at all, but simply demographic measurements. If Kinsey wishes to make descriptive claims about overall American sexual practices, clearly a non-biased sample is desirable. However, if one is testing the claim that the half-life of radium is always 1600 years or that the M/F ratio of neonates is always 0.51 (regardless of conditions), then it makes sense to focus our inquiry on samples of radium or births in extraordinary circumstances, namely those which on our background knowledge are most likely to violate the general claim.
In the case of evaluating causal claims by means of controlled tests, the Popperian approach once more exhorts us to put most effort into controlling for those factors which are most likely to be alternatives to the causes described by our hypothesis. Of course, since our background hunches about the weaknesses of our conjectures are always fallible, our assessments of the severity of a test are also fallible and this is a good reason for eventually performing a wide variety of tests whether they appear to be severe or not.
There have been a variety of reactions to Popper's account of severe testing. Bayesians have analyzed parallels between Popper's account and their own. Proponents of the semantic view of theories, on the other hand, sometimes imply we should invert Popper's methodology and gradually increase the domain of a theoretical model by first trying to apply it to the instances most similar to the paradigm cases around which the model was originally constructed.
What new perspectives on scientific testing are provided if we view theories as solutions to problems? Let's begin with a non-scientific example adapted from van Fraassen (whose views we will discuss later). Suppose we wish to test the claim C: Eve ate the apple from the tree of knowledge.
Now imagine two problem situations. In the first case, theologians are puzzling over the exact symbolism of the apple treel Did it stand for eternal life or did it have something to do with the knowledge of good and evil? C proposes an answer.
In the second case, let us suppose that the controversy is over whether Eve also ate the apple or whether she merely tempted Adam to eat while remaining pure herself.
Now we can well imagine that the sorts of historical and textual testing of C which would be appropriate in the two problem situations would be quite different. The theologians would look primarily at evidence relating to the tree issue and might not even care whether it was Adam or Eve or both who ate the apple. In the second problem situation the relevance of the tests would be reversed.
I conclude that at least in some cases, knowing which problem the theory was supposed to solve would influence our choice of tests. Since scientific theories have lots of content (and hence lots of places to go wrong) and since most of our theories are probably literally false, it makes sense to focus our testing on the aspects which are most relevant to the problem we are trying to solve. Criticism of the non-relevant parts (such as "Eve didn't actually eat the apple -- she just bit into and chewed it up but didn't swallow it because just then God came and chased them out") may strike us as pedantic.
Knowing the problem-situation seems to help us choose relevant tests in the case of the idiographic inquiry where the conjectures are singular statements. But what about in the case to law-like hypotheses? Do we really need to know what the question is in order to test the truth of the answer?
I grant that in the case of fundamental scientific theories the influence of problem on testing may be less, but I still think it may be as important as Popperian severity which is based on improbability. Here is an illustrative example -- consider the following conjecture:
C: The atomic weight of oxygen is sixteen.
Now the most severe test we can think of is to make measurements accurate to six figures. (It is highly improbable that this value is exactly right.) And if the issue is the existence of isotopes that would be quite appropriate. But what if the problem-situation is an earlier one in which the main dispute is whether oxygen gas is diatomic? Then accuracy to six significant figures is not relevant at all.
Perhaps this point is better expressed by saying that before testing one should clarify or amplify the conjecture. But then this process will also require us to go back to the problem for which it is intended to be a solution.

d. The Ambiguity of Falsification
We have raised questions about the choice of tests to be performed, but as described so far, the logic of testing is simple and clear-cut: (1) We derive a prediction from our conjecture which can be subjected to experimental check. (2) We do the experiment. (3) If the prediction is wrong, the theory is refuted. Period. Or so it would seem. In the typical scientific case, however, the situation is more complicated and the decision as to exactly which premise is to be given up is less straightforward.
Let us illustrate the dilemma with a famous scientific example, the case of stellar parallax. After Copernicus put forward his theory that the earth revolved around the sun, astronomers noted that if his theory were true, one should be able to detect stellar parallax. If one is moving with respect to an object, then the direction in which the object appears changes. This phenomenon is known as parallax. As a race driver moves past the pit stop, at first it is ahead of him/her. Later it is behind. The angle a in the diagram below is called the angle of parallax. A similar diagram could be used to illustrate Copernicus' theory of the earth's annual movement with respect to a particular star.
----------------------------
Insert Figure 3.3 about here
----------------------------

But when 17th-century observers looked for stellar parallax, they couldn't detect any. Didn't this mean the theory was false? The supporters of Copernicus' theory decided to blame an auxiliary assumption instead. Their argument can be illustrated with the race-car analogy. Suppose the driver sights on a distant radio tower instead of on the pit stop. Now the angle of parallax may become too small to be easily noticeable. As the radio of D to R increases, a gets smaller. At very large valued of D it will become to small to detect. According to estimates of the distance between the earth and the stars available at the time, stellar parallax should have been observable. But the Copernicans argued that these estimates were wrong and claimed that the universe was about 1,000 times bigger than had previously been imagined. This bold move turned out to be correct, but 200 years passed before stellar parallax was detected experimentally.
----------------------------
Insert Figure 3.4 about here
----------------------------

The logic of the testing situation was as follows:
Copernican theory: The earth revolves around the sun, which is stationary relative to the stars.
Auxiliary hypothesis: The distance between the earth and the stars is about 20,000 earth radii.
Experimental Prediction: (Therefore) Stellar parallax should be easily observable with the apparatus available.
Experimental Finding: No stellar parallax is observable with the available apparatus.

Since the prediction failed, one of the premises had to be wrong. Copernicus blamed the auxiliary hypothesis; anti-Copernicans defended it and blamed the theory instead. With no good way at the time to test the auxiliary hypothesis, the status of the Copernican theory was left open.
The philosopher who first stressed that almost all tests involve a lot of auxiliary assumptions was Pierre Duhem, an early 20th-century philosopher, physicist, and historian of science. Hence, we will call the following the Duhemian problem:
When an experimental prediction turns out to be false, should the scientist blame the theory under test or the auxiliary assumptions (or both)?
Popper emphasizes that there is no methodological recipe for dealing with the Duhemian problem, but a few guidelines can be laid down. First of all, one should not use the Duhemian problem as a general excuse for one's pet theory. It is not good methodology to say, "My theory's prediction failed? Well, not to worry. I probably made a false auxiliary assumption somewhere along the line." If one wants to keep the theory despite the prediction failure, one must point to a specific auxiliary assumption and then design tests of that auxiliary assumption. If the auxiliary assumption passes the tests, then we should conclude that our theory and not the auxiliary was false. Sometimes, however, it is not possible nor practical to test auxiliary hypotheses. (We saw an example of this in the Copernican case.) In such instances, we can draw no firm conclusions about the original test situation. If a theory in conjunction with a variety of auxiliary assumptions makes a lot of false experimental predictions, though, we tend to decide that the theory is false, even though we can't conclusively test each auxiliary.
The Duhemian dilemma can be analyzed as follows:
The theory under test (T) when conjoined with one or more auxiliary hypotheses (A) makes a prediction (p). Experiments show that p is not the case. By modus tollens we know that either T or A (or both) must be false, but logic doesn't tell us which.
(T & A) - p
~p
(Therefore) ~T, or ~A, or ~T & ~A

Note that in the pure Duhemian problem situation there is no controversy about the experimental result, ~p. Furthermore, all parties agree that T & A imply p. The disagreement arises about whether to revise A or to revise T. Of course, there are also cases in which people cannot agree on experimental results or on what exactly the implications of the theory are. These latter disagreements can usually be settled either through further experimentation or by means of logical analysis. The Duhemian problem is often more recalcitrant. Popper does give one firm piece of methodological advise. No matter which premise we decide to replace the substitute should never be lower in empirical content.
The main responses to Popper's remarks on the Duhemian dilemma, such as those of Kuhn and Lakatos, point out that in the history of science, it is fairly rare to find a case where a theory is refuted by a single, decisive experiment. More often theories come to be rejected through a variety of prediction failures. Theories are rarely struck down by a blow from one type of crucial experiment, no matter how many times that experiment is repeated. Rather they are eroded away be an accumulation of anomalous results. We will develop this important critique in the next chapter. Here I will only remark that if we view theories as problem solutions, then as we may modify our system in response to the Duhemian dilemma we should either insure that the new system also answers the original problem(s) or else explicitly acknowledge that we are abandoning them.
e. The Status of Corroborated Theories
We have discussed what happens when our theory's prediction is refuted--either we revise it or adjust an auxiliary hypothesis. What happens if our theory passes the most severe experimental tests we can devise with flying colors? Can we then declare it proven true, or at least highly probable? It is perhaps on this issue that Popper's disagreement with the positivists is deepest.
First of all the history of science strongly suggests that we should never feel completely certain about any scientific generalization, no matter how frequently or stringently it has been tested. Newton's theory of classical mechanics had perhaps the best track record ever; yet it was superceded by Einstein's relativistic mechanics. Here are a few other examples of well-established claims which eventually had to be corrected or rejected:
(i) Matter cannot be created or destroyed. (Not true in nuclear fission or fusion processes.)
(ii) The sun rises once every twenty-four hours. (Not true at the North Pole.)
(iii) All molecules of water are made of the same stuff. (Not true for heavy water, deuterium oxide.)
(iv) The major difference between homo sapiens and the lower animals is that man can use language. (Not true for chimpanzees which can use sign language.)
(v) Living matter can only come from living matter; it cannot be formed from inanimate substances. (Not true--amino acids can be synthesized from ammonia, methane, hydrogen, etc.)
So the history of science warns us that any scientific claim is fallible. Logic and philosophy of science can help us understand why this is so. Here are some of the reasons:
(i) Generalizations cover a potential infinity of cases. But we can only check on a finite number of predictions. We can never be sure that the next case won't violate the rule (e.g., a black swan may turn up in Australia).
(ii) Scientific theories make infinitely precise claims. But we can only make measurements of finite accuracy. (For example, Newton's law of gravitation says the force of gravity varies inversely with the square of the distance, i.e., the exponent is r2.00000...but our measurements cannot discriminate between r2 and r2.0000000001.)
(iii) Many of our scientific laws only hold under idealized conditions--to give two very simple examples, the law of the lever assumed no friction at the fulcrum, and the law of the pendulum assumes there is no air resistance. Of course, we can try to minimize such interferences when we conduct tests, e.g. but resting our lever on a point or setting up a pendulum in a vacuum, but our experiments never achieve the perfect conditions which are assumed in our ideal laws.
(iv) There may be alternative theories which we have not even dreamt of yet which account for all of the data we have in hand.
For all these reasons, theories are underdetermined by our observational results and can never be proved through any amount of observation and experiments. There are no rules for deciding when to accept a theory (for the time being) and move on to new problems, but what we can do is to answer each of the above sources of fallibility as best we can.
(i) By testing in widely scattered domains, we guard ourselves against parochialism, e.g., the black swans in Australia.
(ii) By making our tests as precise and ideal as possible, we can approach the infinite precision and perfection of our theories.
(iii) And the best way to rule out alternative explanations is to deliberately try to imagine radically different ways of explaining our results. If we can devise a new alternative, we can then set up a crucial experiment between the two competing accounts.
But what is the exact epistemological status of theories which have survived critical scrutiny? What positive claims can we make about them? Popper introduced the term corroboration to describe the severity of the tests passed by a hypothesis, but he emphatically denies that the degree of corroboration is to be interpreted as a degree of reasonable belief in the hypothesis or the probability that it is true. However, he does say that for purposes of practical action, it is rational to base our behavior on our most highly corroborated theories. And for purposes of scientific inquiry we should use the degree of corroboration of various claims as guides to criticism and revision of our scientific systems. The Duhemian problem would become completely intractable if we had no way of at least tentatively assigning the blame for prediction failures. And the whole mechanism of falsification rests on the existence of 'basic' statements, i.e., statements which all observers can test and presumably corroborate for themselves.
Popper's theory of corroboration and his views of induction are perhaps the most controversial aspects of his philosophy and I will not comment on that far-ranging debate. I will only remark that to the extent that tests are chosen because of their relevance to the problem-situation, our estimates of corroboration or Bayesian confirmation or what have you will also be dependent on problems.
3.? Final Comments
Popper's characterization of the objective aspects of problems is a good starting point, but it needs to be accompanied by a fuller account of the factors, be they objective or subjective, which influence problem choice. If scientists tried to work on all the problems which exist in a World-3 sense, or chose their problems randomly, science as we know it would not exist.
Popper's methodology stresses problems as the starting point of inquiry but makes problems less central in the later stages of theory evaluation. A more thorough-going problems approach would lead us to modify Popper's account of preliminary theory appraisal and the prioritizing of scientific tests. It is less obvious how, if at all, viewing theories as solutions to problems should affect our philosophical accounts of theory corroboration or confirmation.