Listen to the comments of Richard Feynman, Nobel Laureate in Physics, on social science, from 2':52" through 4':50".
Psychological Science: Mathematical Argument and the Quest for Scientific Respectability – Part 2 (Norman Costa)
By
Norman Costa Ph.D.
The Problem of Measurement
The recognition of psychology as a proper science was a major goal of the pioneers in experimental psychology, and those interested in the study of individual differences. They understood that the basic function of science was to describe the properties of things through observation and the recording of data. Key to the process of observation, and essential for mathematical argument, is the concept of measurement.
The early psychologists saw, clearly, the parallel between measurement tools in psychology (tests, scales, experimental apparatuses,) and measurement instruments in the more successful sciences. In anecdotes, they told of conversations they had with other scientists on the reliability of test instruments. They were told, so they reported, that a measurement tool must yield the exact same results each time it is used, assuming identical circumstances, if it is going to be considered reliable. They translated their understanding of test reliability into what is formulated today in the Standards for Educational and Psychological Testing1 – reliability is consistency of observed results. In this definition there is no reference to the purpose or function of the test instrument. To be reliable, it is only necessary for a test to be consistent in yielding observed scores, independent of the purpose of the test. In another article, I will show that the concept of reliability, without a statement of purpose, is an illusion. It was a fundamental mistake for psychological science, and was completely out of sync with the concept of test reliability in all other sciences.
They believed they addressed the issue of mathematical argument with the development of descriptive statistics (beginning with Galton and Pearson,) the development of the correlation coefficient to express relationship (Galton, Pearson, Spearman and others,) and in the invention and development of factor analysis for research on the structure of human intelligence (Spearman, Thurstone, and others.) The body of statistical tools, tests, scales, experimental apparatus, and text books (Thorndike, Spearman, Thurstone, Guilford, Guttman, and Gulliksen) were crowning achievements and sources of pride for scientific psychology. Of great importance and satisfaction to psychology was the development of the correlation coefficient, and its application to assessing the reliability of measurement instruments. Eventually, we will see that using the correlation coefficient to measure reliability was a fundamental mistake. The mistake was brought about by an overvaluing of the correlation coefficient, and an over-eagerness to develop mathematical argument for psychological science.
Recognition from the more successful sciences was not forthcoming, however. The scientific community never looked closely at the psychologists' definition of reliability that was without a statement of purpose, nor at the use of the correlation coefficient to measure reliability. What they did focus on was psychology's understanding, or lack of understanding, of the concept of measurement in science.
The Fuerguson Committee
To understand the failure of recognition from other sciences, It is necessary to go back to the work of the Ferguson Committee, established by the British Association for the Advancement of Science in 1932. The purpose of the committee was to determine whether or not real scientific measurement was a possibility for the social sciences. In other words: Could the field of psychology aspire to the level of a real science or not? The Ferguson Committee, dominated by N. R. Campbell, an important figure in the philosophy of science for the physical sciences, answered the question with a resounding, “No!” The official report put its response in a highly technical treatment, but the issues for psychological science come down to three elementary points:
- The definition of measurement;
- Establishing measurement standards (units of measurement) through research; and
- Physical and structural additivity.
During the time that the Ferguson Committee was seated, psychological science did not address these issues, and, for the most part, did not understand them. Today, psychological science has no definition of measurement, no research on establishing measurement units, and no demonstration of physical or structural additivity.
In all of science, measurement is one of the simplest ideas to understand. Measurement is a comparison to a standard. The standard for a second of time is 9,192,631,770 completed cyclical vibrations of an energized cesium atom. The standard for length is the metre, and is the distance that light travels, in a vacuum, in 1 ⁄ 299,792,458 of a second. The standard for mass is the kilogram, or one litre of liquid water. Within each science or engineering discipline there are many measurement standards, all of them requiring research to establish their properties and uses. Electrical engineering uses measurement standards like ampere, volt, ohm, and so on. Astronomy uses a measurement standard known as a Standard Candle, a class of astronomical objects whose members have known luminosities.
N. R. Campbell's theory of scientific measurement was based on the concept of additivity, both physical and structural. Physical additivity was akin to taking many one-foot rulers and laying them end-to-end alongside a much longer object to be measured. Add up the number of rulers, and you have a measure, in feet, of the length of the object. Structural additivity was a set of mathematical axioms developed by Otto Holding, and published in 1901. We can understand these axioms with no more than our first course in algebra. For example,
- a is equal to b ( a = b ) or not equal ( a < b; a > b ).
- For any lengths a and b, a + b > a.
- Order of operation doesn't matter, a + b = b + a.
- Additive relation is indifferent for compound operations, a + ( b + c ) = ( a + b ) + c.
This meant that psychologists had to conduct experiments to demonstrate the properties (or conceptual analogs) of physical and structural additivity in psycho-physical, psycho-social, educational, and psychological measurement.
The experimental psychologist, Stanley Smith Stevens of Harvard University, served with the committee. His response to the Ferguson Committee was to ignore their final report. In short, he dismissed the matter entirely and felt that they simply got it wrong. He did not bother to address the call for research that would address the issue of measurement standards and the properties of their units. Additivity was not a concern for him, because he did not accept the definition of measurement as a comparison to a standard. Scientific psychology did as Stevens did, and ignored the Ferguson Committee. In the main, it ignored the whole subject of whether measurement in psychology qualified the discipline as a science on a par with all other successful sciences. One notable exception, a voice crying in the wilderness, is the work of psychologist, Joel Michell of the University of Sydney, Australia. He captures the idea of comparison to a standard very nicely: Measurement is the numerical estimation and expression of the magnitude of one quantity relative to another.
The Greatest Scientific Side-Step in the History of Psychology
The consequences to psychology as a science were significant, if not profound, but of little concern to most psychologists. As of today, scientific psychology has NO coherent definition of measurement. What scientific psychology has is a ridiculous definition and is frequently cited from a 1947 paper by Stevens. It reads: “...[M]easurement, in the broadest sense, is defined as the assignment of numerals to objects or events according to rules.” Stevens adapted his definition from N. R. Campbell. According to Campbell, measurement is the assignment of numerals to an attribute according to scientific laws. When one reads further into Campbell's definition, it prescribes standards for comparison, and research into units of measurements. That is what he meant by the phrase, "...according to scientific laws."
Stevens went on to develop his theory of measurement scales, and it is well known to all students of psychological research methods. He asserted, correctly, that different types of measurement scales (nominal, ordinal, interval, and ratio scales) are derived from different measurement operations that we use to produce them. This is of utmost importance to psychological science because, depending upon the type of measurement scale a researcher is using, different decisions must be made about how to analyze the data.
Stevens thought he was developing a theory of measurement. He thought he could produce a definition of measurement that was based upon the operations required to produce the measurements. That is why he substituted the phrase, "...according to rules," in his own definition. He was greatly influenced by the concept of operationalism in the work of fellow Harvard faculty member, Percy Bridgman, a Nobel Laureate in Physics. A close examination of his paper, however, shows that his theory of measurement is really a self-contained, mathematical description of the properties of different numerical scales, not a theory nor definition of measurement. He constrained himself to the confines of the internal mathematics involved, and never ventured to examine the relationship of a fundamental or derived measure to a standard. He was stuck on the fact that the differing operations [different 'rules'] that were applied, would impute different properties to the assigned numerals. He was correct, as far as it went, but scientific psychology still does not know what measurement is.
In the end, S. S. Stevens' discarding of the Ferguson Committee's final report, his substitution of a description of measurement scales for a theory of measurement, abandoning comparison to a standard in favor of an operationalism-only description of measurement, and the dismissal of the counsel of the best minds in the science, philosophy and mathematics of measurementr, all of this adds up to the greatest scientific side-step in the history of psychology.
It Doesn't Have To Be This Way
It should be noted that the understanding of measurement as a comparison to a standard was discussed by the reknowned psychologist, L. L. Thurstone, in 1929. He was trying to conceptualize the subjective responses from a person on a perceptual scale as resulting from a process of comparison. The specific technique he was using is referred to as pairwise comparisons. This technique would imbue such data with properties that made them amenable to mathematical representation and argument.
“A very important consideration in rendering objective a value-increment that is essentially subjective is that as long as each of these psychological values is considered separately, it can never register objectively. Every scientific observation is, in fact, a comparative situation and neither of the terms of a percept can be objectively recorded in isolation. This is true of cognitive as well as of affective comparison.”
His views were encapsulated into his “Law of Comparative Judgment.” Unfortunately, the idea of measurement as comparison to a standard did not find its way into psychology or into the Standards For Psychological and Educational Testing, 1999.
I know very little about psychological testing; what I do know is that the field is rife with controversy. I guess that's the point you are making. Thanks for sharing the history.
Posted by: Ruchira | January 09, 2012 at 11:46 PM
@ Ruchira:
And this is just the beginning.
Posted by: Norman Costa | January 10, 2012 at 12:24 AM
So the results of psychology 'experiments' are matters of faith rather than measurement? Can we anoint it as a new religion yet?
Posted by: Sujatha | January 10, 2012 at 03:18 PM
@ Sujatha:
I mentioned the name Joel Michell from the University of Sydney. He is now retiring after a long and distinguished career. I got in touch with him a couple of years ago when I first started getting into the problems of measurement for psychology. Joel had been at this for a much longer time than I, and with far better credentials and more experience. He noted that it was interesting that I had come to the same conclusions as he, though completely independently. I asked him about the reaction and reception to his ideas from other research psychologists. When he would give presentations and entertain discussions with his audience their responses were pretty much the same. Yes, they agreed with him that the problem of measurement is a big, glaring hole in the science of psychology. However, they had no motivation to change what they were doing, and would continue to lead their professional research lives as they had done before.
I taught psychological research methods for a couple of years. The text I used did not have a definition of measurement. Yet, the entire book is founded upon the necessity of observation through measurement.
The "Standards for Educational and Psychological Measurement" (1999) is THE BIBLE for professional practice in testing. It is published by The American Educational Research Association (AERA), The American Psychological Association (APA), and the National Council on Measurement in Education (NCME). The chapter on test reliability has NO definition of measurement, even though the concept of test reliability has to do with measurement errors.
My next posts on this subject will deal with the theory of test reliability. I will show that it is completely out of sync with theories of test reliability in all other sciences. It is hard to argue with Feynman's observation that the social sciences have got the form right, but we are not producing any laws of nature. If it is possible to do so, we will find it very difficult if we don't get our scientific house in order.
Posted by: Norman Costa | January 10, 2012 at 04:52 PM
Looking forward to the next part of your series, now that I see where it is headed- Psychology would be a rejuvenated science indeed, if acceptable modalities for reliable measurement are in place.
Posted by: Sujatha | January 10, 2012 at 07:31 PM
This post got me curious about the background of words like "standard" and "measure." As usual, OED is informative. On the etymology of "standard" it remarks:
"Measure" is more straightforward, until it takes a Mobius turn:
Feynman on the social sciences is also fascinating, not least because he has nothing to offer from a scientific point of view. His remarks are purely anecdotal and sprung from an ipse dixit: "I know what it means to know something." Now, I know what he means when he says that, because his remarks are as mundane as he claims. But it's interesting that he begins by condemning the social sciences for responding to the successes of science, for wanting to emulate the sciences and to achieve their level of authority by doing so, and then he implicitly urges us to rely on his success as grounds sufficient to make his point.
Posted by: Dean C. Rowan | January 13, 2012 at 05:10 PM
@ Dean:
When I covered this topic with my psychological research students, the first thing I say is that measurement is a comparison to a standard.
Then I throw them a curve ball by asking them where standards come from. Eventually, I get around to telling them that we make them up. If you get one or two people to agree with you, then you have a standard. If you get lots of people to adopt your standard, then you have a unit of measure.
I have a different view on Feynman, and its the same view I have of any senior/emeritus scientist in any field. I give them a great deal of latitude in expressing views on their areas of expertise or about science in general. They can drop the necessity of 'rigor' in formulating their utterances as if they were to be recited before an audience of graduate students. Hopefully, they will have something meaningful and more accessible to a wider audience. I used this video because I think he is more right than wrong about psychological science. I will be making more of that point in my coming articles - the next one in on my computer screen as I type.
I've listened to Steven Weinberg talk about 'truth' and never being able to find the ultimate answers to questions as a manifestation of the human condition. It actually depresses him to contemplate the matter. Leon Lederer, former head of Fermi Lab talks about Heisenberg's uncertainty principle, then move to a more philosophical view of uncertainty as a perspective we ought to adopt in everyday life. Of course, their pronouncements can be laughable when they leave the boundaries of their knowledge and experience. James Watson has demonstrated this a couple of years ago.
Posted by: Norman Costa | January 13, 2012 at 08:28 PM
Right on, Norm. About scientists weighing in on social, philosophical or other non-science subjects, that is. I don't see why scientists cannot speak about such matters since they experience the vicissitudes of life like everyone. They may not make much sense always on abstract matters like meaning of life or uncertainties of the human condition, but it is not as if anyone else does either.
I have forgotten how my own children were introduced to physics in American schools. In schools in India, I remember (both as a student and a teacher) the first couple of lessons were about the nature and history of physics followed by several about measurement and standards.
Posted by: Ruchira | January 13, 2012 at 09:44 PM