Psychology's Quest for Scientific Respectability

By

Norman Costa Ph.D.(Note: This article was originally published in two-parts in January and February of 2012under the titles "Psychological Science: Mathematical Argument and the Quest forScientific Respectability - Part 1 and 2." The reason for combining the two was so thatit could be submitted for the 3QuarksDaily prize in Science Writing.)Part 1 - Mathematical Argument`We are reminded by Carl Sagan in his book, Cosmos, that the underpinning of modern`

science with mathematics goes back to Pythagoras. In the search for truths in nature,

however, we no longer look for them in Pythagoras' mystical, even magical, power of

numbers. Today, mathematics is indispensable for science as method, and science as

content. We count, measure, perform basic operations (add, subtract, multiply, and

divide,) compute values, solve equations, use visual display to communicate quantitative

information, conduct statistical tests, and represent things and ideas with symbols and

relationships.`The history of psychological science, even to the present day, has been a quest for scientific`

respectability. Few things have been as important to this quest as the development of

mathematical argument for the science of psychology. Nothing has been more important,

or as far reaching, for mathematical argument in psychology, than the development of the

correlation coefficient. Because much of psychology (and the social sciences in general)

has been the examination of individual differences, it was inevitable that tools be developed

to express relationships and dependencies among different traits, capabilities, and just

about anything that could be measured and recorded about people.`The rapid fire discoveries, in the 19th century, of fundamental laws of nature in physics,`

chemistry, and life sciences created an air of expectation, pride, and optimism. Some

held the view that the final discovery of all laws of physical nature would be concluded

in the early part of the new century. Psychology envisioned its own role in this great leap

forward in knowledge and science. The development of mathematical argument was

about to elevate psychology to a level that was on par with the more successful physical

and life sciences – or so it was hoped.`It is difficult to appreciate, today, how exciting it was for scientific psychology in the late`

19th and early 20th centuries. The development of the correlation coefficient became the

Royal Road to scientific respectability, at least in the minds of the pioneers of

psychological science. Statistical correlation formulas provided powerful tools that could

be applied to a myriad of problems in the budding social and economic sciences. The

correlation coefficient led to the development of other powerful tools like multiple

correlation, canonical correlation, regression, and factor analysis. It gave impetus and

support to the development of other tools for mathematical argument, particularly the

concept of true score, and statistical tests.

A. The Correlation Coefficient,r

`The idea of using mathematics to demonstrate a dependency or association between `

two factors began with the work of Auguste Bravais (23 August 1811, Annonay, Ardèche

– 30 March 1863, Le Chesnay, France). One is more likely to read that Sir Francis Galton

FRS (16 February 1822, Birmingham, England – 17 January 1911, Haslemere, Surrey,

England) was the originator of mathematical correlation. This is not true, though Galton,

considered the father of modern psychometrics, was a genius developer of the descriptive

statistics that we use to this day: standard deviation, regression analysis, and the properties

of the bivariate normal distribution. He saw a need to quantify the relationship between

different variables in biometric studies, census and population data, psycho-physical data,

and in hereditary and eugenic research. He wanted to express relation and degree of relation

in his research.

`Galton turned to the work of Auguste Bravais. Building upon Bravais' work he developed some `

approaches and early indexes of association. Galton did not invent the correlation coefficient,

but he was the first person to apply correlation to data that he collected in the field. Famously,

Galton was the first person to give the correlation coefficient a single symbol, r. Karl Pearson

FRS (27 March 1857, Islington, London, England – 27 April 1936, Coldharbor, Surrey,

England,) a student of Galton, wrote of the contribution of his mentor, and the origin of

mathematical correlation in Philosophical Transactions of the Royal Society of London, 1897,

Vol. 187.

`“The fundamental theorems of correlation were for the first time and almost exhaustively `

discussed by [Auguste] Bravais ('Analyse mathématique sur les probabilités des erreurs

de situation d'un point.' Mémoires par divers Savans, T. IX., Paris, 1846, pp. 255-332)

nearly half a century ago. He deals completely with the correlation of two and three

variables. Forty years later Mr. J. D. Hamilton Dickson ('Proc. Roy. Soc.1886,p. 63) dealt

with special problem proposed to him by Mr. Galton, and reached on a somewhat narrow

basis* some of Bravais' results for correlation of two variables. Mr. Galton at the same

time introduced an improved notation which may be summed up in the 'Galton function' or

'coefficient of correlation.' This indeed appears in Bravais' work, but a single symbol is

not used for it.”

`There was great enthusiasm for measuring association, but, none of the early approaches and `

indexes was wholly satisfactory. Not until Galton took Karl Pearson under his wing as a

protege did the mathematics and statistics of association become firmly developed and

grounded.

`Karl Pearson was a brilliant mathematician and mathematical statistician who contributed to `

the work of Galton and his students in developing statistical tools to measure association

(what we know as relatedness or correlation.) The single most important statistic of Pearson,

especially regarding the development of modern theory of psychological and educational

testing, was Pearson's Product Moment Correlation Coefficient.

`Karl Pearson's work overlapped with Galton's other students, notably Charles Spearman FRS `

(10 September 1863, London, England – 17 September 1945, London, England) in the United

Kingdom. Spearman was to develop his own measure of relationship, which paralleled

Pearson in concept, but used a different computational approach. We know it as Spearman's

Rank Order Correlation Coefficient. In time, Karl Pearson and Charles Spearman would have

differences on various aspects and uses of correlation formulas. Spearman was the most

successful in finding applications for statistical computations of association. He was the most

articulate and insightful statistician and psychologist when it came to applying correlation

analysis to the premier research problem of the day – the study of human intelligence and its

composition. He developed the techniques of factor analysis – derived and extended from

correlation analysis – that would be directed toward answering questions about the elemental

nature of human intelligence.

Edward L. Thorndike (August 31, 1874, Williamsburg, Massachusetts, U.S. – August 9, 1949,

Montrose, New York, U.S.) was pivotal in introducing Spearman's concepts to American

psychologists, because he, too, was looking for measures of association, and a way to

measure test reliability. Spearman's two important publications in 1904 had to be published

in America since the British Journal of Psychology had not yet been inaugurated. They were:

““General intelligence,” objectively determined and measured.” American Journal of

Psychology. 15, 201-293; and “The proof and measurement of association between two

things.” American Journal of Psychology. 15, 72-101. This was fortuitously helpful to

Thorndike in the development of his own views on mental and social measurement. Working

from Columbia University in New York City, he put Spearman's work into his very influential

texts. An introduction to the theory of mental and social measurement, New York: Teachers

College, Columbia University, Publishers. (1904, 1913, 1922.)

`The period for development of concepts, and statistical formulations, for measures of `

association – what we call correlation – was an exciting time among early

psychologists and psychometricians . Psychology, virtually alone, was in the

forefront of applied statistical development when it came to measures of

association. Not only would it do wonders for applied problems in psychology

and education, they believed it would bolster the image and credentials for

psychology as a science, and lead to the recognition of psychological science

from their contemporaries in other successful sciences.

B. Psychological Test Theory - TheTrue Scoreand Reliability

`Spearman was the first person to articulate the concepts of true score, and error score `

(true score minus observed score,) and the idea of errors of measurement. The invention of

the true score, a mathematical construct, is one of the most important events in the

history and development of psychological and educational testing. It is literally true, that

from this single invention was spawned a world-wide, multi-billion dollar industry, as

ubiquitous and powerful, today, as anytime in its history.

`When it came to test theory, Spearman was in the forefront, as well. After developing his `

concept of true score, and applying it to the study of tests and testing, he came to propose

correlation as a measure of the reliability of a test. This was another extremely exciting

moment for psychology and education. It is impossible to overstate the professional pride

and collective sense of achievement among psychologists in America and Europe

(England in particular,) in what they perceived to be the elevation of psychology to the

status of legitimate science.

`The rationale for the use of the correlation coefficient to measure reliability proceeded in `

the following way. First, the principal researchers in the field had been querying their

colleagues in the more successful sciences about the nature of measurement and

scientific instrumentation. They saw direct parallels to measurement and tests in

psychology. The scientific psychologists wanted to emulate the same processes, and

develop analogous concepts that would be recognized and understood by their

non-psychologist contemporaries.

`In the manner of anecdote, in varied notes and writings, they talked about the counsel `

received from their scientific colleagues in other disciplines. They learned, so they

thought, what was regarded as an essential element in scientific observation and

measurement. A measurement instrument was only as good as it was reliable. It had

to produce the same measurement score or results every time it was used, all things

being equal. If repeated measurements gave different scores, when the same results

were expected, then the measurement instrument, or test, was unreliable and of no

use to science. These informal comments from encounters with scientists in more

successful fields, never mention any extended discussion on the definition of reliability,

or examples of measurement. The only thing they remember and report is that

reliability is consistency of observed scores. There is no indication that they

interrogated their colleagues in other fields with any desire to understand the concept

of reliability any further.

`Spearman, and all others in the early years of scientific psychology, interpreted this `

counsel from non-psychological scientists in a very literal manner. Reliability was

consistency of measurements. This literal translation of the concept of reliability would

be a huge mistake. I shall cover this in a later article.

`Second, Spearman believed that correlation analysis would measure consistency of `

scores. The technique of correlation was straightforward. Administer a test, and then

re-administer the same test to the same sample of people. If the test was reliable, and

would produce consistent scores upon re-administration, then he should observe the

same relative standing of people on both test administrations. The people who tended

to be higher than the others on the first administration, should tend to be higher than

the others on the second administration. The same would be true for those who tended

to score lower than others on the first administration. They would tend, also, to be

lower on the second administration. Correlation coefficients, then as now, if they do

anything, indicate whether relative standing on one test, is related to relative standing

on another test.

`The question never answered, because it was never asked, is why consistency of score `

should be inferred from rank order of scores. Nothing in the definition of reliability says

anything about absolute values of individual scores, or mean performance of the

samples. This is the origin of the inability to distinguish, unambiguously, between

reliability and validity. I shall cover this in a later article.

`Finally, from Spearman's conceptualization of true score, he reasoned that he could `

parse the relative size of true variance from total observed variance, and use a

correlation coefficient as a means to estimate the ratio of one to the other. The ratio

would be a measure of consistency of scores, or reliability. The excitement over

progress in the field of mental and social measurement was validated for those

pioneering psychologists by Spearman's rationale for the use of the correlation

coefficient for measuring reliability. After all, the correlation coefficient was one of the

great inventions of the new sciences of statistics and psychology. I will discuss in a

later article that reliability as consistency of measurement scores was an incomplete

interpretation of what they heard from their non-psychologist colleagues. Also, the

concept of true score was a hypothetical abstraction and an assumption at that,

though it was treated as an axiom. We are going to find that this led to a

fundamental mistake. The error was in thinking as if the abstract concept of

true score was an actual reality.

Part 2 - The Problem of Measurement and the Greatest Scientific

Side-Step in the History of Psychology

`The recognition of psychology as a proper science was a major goal of the pioneers `

in experimental psychology, and those interested in the study of individual differences.

They understood that the basic function of science was to describe the properties of

things through observation and the recording of data. Key to the process of observation,

and essential for mathematical argument, is the concept of measurement.

`The early psychologists saw, clearly, the parallel between measurement tools in `

psychology (tests, scales, experimental apparatuses,) and measurement instruments

in the more successful sciences. In anecdotes, they told of conversations they had

with other scientists on the reliability of test instruments. They were told, so they

reported, that a measurement tool must yield the exact same results each time it is

used, assuming identical circumstances, if it is going to be considered reliable.

They translated their understanding of test reliability into what is formulated today in

the Standards for Educational and Psychological Testing1 – reliability is consistency

of observed results. In this definition there is no reference to the purpose or function

of the test instrument. To be reliable, it is only necessary for a test to be consistent

in yielding observed scores, independent of the purpose of the test. In another article,

I will show that the concept of reliability, without a statement of purpose, is an

illusion. It was a fundamental mistake for psychological science, and was completely

out of sync with the concept of test reliability in all other sciences.

`They believed they addressed the issue of mathematical argument with the development `

of descriptive statistics (beginning with Galton and Pearson,) the development of the

correlation coefficient to express relationship (Galton, Pearson, Spearman and others,)

and in the invention and development of factor analysis for research on the structure of

human intelligence (Spearman, Thurstone, and others.) The body of statistical tools,

tests, scales, experimental apparatus, and text books (Thorndike, Spearman,

Thurstone, Guilford, Guttman, and Gulliksen) were crowning achievements and

sources of pride for scientific psychology. Of great importance and satisfaction to

psychology was the development of the correlation coefficient, and its application to

assessing the reliability of measurement instruments. Eventually, we will see that

using the correlation coefficient to measure reliability was a fundamental mistake. The

mistake was brought about by an overvaluing of the correlation coefficient, and an

over-eagerness to develop mathematical argument for psychological science.

`Recognition from the more successful sciences was not forthcoming, however. The `

scientific community never looked closely at the psychologists' definition of reliability

that was without a statement of purpose, nor at the use of the correlation coefficient to

measure reliability. What they did focus on was psychology's understanding, or lack

of understanding, of the concept of measurement in science.

A. The Fuerguson Committee

`To understand the failure of recognition from other sciences, It is necessary to go back `

to the work of the Ferguson Committee, established by the British Association for the

Advancement of Science in 1932. The purpose of the committee was to determine

whether or not real scientific measurement was a possibility for the social sciences.

In other words: Could the field of psychology aspire to the level of a real science or

not? The Ferguson Committee, dominated by N. R. Campbell, an important figure in

the philosophy of science for the physical sciences, answered the question with a

resounding, “No!” The official report put its response in a highly technical treatment,

but the issues for psychological science come down to three elementary points:

1. The definition of measurement;

2. Establishing measurement standards (units of measurement) through research;

3. And physical and structural additivity.

`During the time that the Ferguson Committee was seated, psychological science did `

not address these issues, and, for the most part, did not understand them. Today,

psychological science has no definition of measurement, no research on establishing

measurement units, and no demonstration of physical or structural additivity.

`In all of science, measurement is one of the simplest ideas to understand. `

Measurement is a comparison to a standard. The standard for a second of time is

9,192,631,770 completed cyclical vibrations of an energized cesium atom. The

standard for length is the metre, and is the distance that light travels, in a vacuum,

in 1 ⁄ 299,792,458 of a second. The standard for mass is the kilogram, or one litre of

liquid water. Within each science or engineering discipline there are many measurement

standards, all of them requiring research to establish their properties and uses.

Electrical engineering uses measurement standards like ampere, volt, ohm, and so on.

Astronomy uses a measurement standard known as a Standard Candle, a class of

astronomical objects whose members have known luminosities.

`N. R. Campbell's theory of scientific measurement was based on the concept of `

additivity, both physical and structural. Physical additivity was akin to taking many

one-foot rulers and laying them end-to-end alongside a much longer object to be

measured. Add up the number of rulers, and you have a measure, in feet, of the

length of the object. Structural additivity was a set of mathematical axioms

developed by Otto Holding, and published in 1901. We can understand these axioms

with no more than our first course in algebra. For example,

1. a is equal to b ( a = b ) or not equal ( a < b; a > b ).

2. For any lengths a and b, a + b > a.

3. Order of operation doesn't matter, a + b = b + a.

4. Additive relation is indifferent for compound operations, a + ( b + c ) = ( a + b ) + c.

`This meant that psychologists had to conduct experiments to demonstrate the `

properties (or conceptual analogs) of physical and structural additivity in

psycho-physical, psycho-social, educational, and psychological measurement.

`The experimental psychologist, Stanley Smith Stevens of Harvard University, served `

with the committee. His response to the Ferguson Committee was to ignore their final

report. In short, he dismissed the matter entirely and felt that they simply got it wrong.

He did not bother to address the call for research that would address the issue of

measurement standards and the properties of their units. Additivity was not a concern

for him, because he did not accept the definition of measurement as a comparison to

a standard. Scientific psychology did as Stevens did, and ignored the Ferguson

Committee. In the main, it ignored the whole subject of whether measurement in

psychology qualified the discipline as a science on a par with all other successful

sciences. One notable exception, a voice crying in the wilderness, is the work of

psychologist, Joel Michell of the University of Sydney, Australia. He captures the idea

of comparison to a standard very nicely: Measurement is the numerical estimation and

expression of the magnitude of one quantity relative to another.

B. The Greatest Scientific Side-Step in the History of Psychology

`The consequences to psychology as a science were significant, if not profound, but of `

little concern to most psychologists. As of today, scientific psychology has NO coherent

definition of measurement. What scientific psychology has is a ridiculous definition and is

frequently cited from a 1947 paper by Stevens. It reads: “...[M]easurement, in the broadest

sense, is defined as the assignment of numerals to objects or events according to rules.”

Stevens adapted his definition from N. R. Campbell. According to Campbell,

measurement is the assignment of numerals to an attribute according to scientific laws.

When one reads further into Campbell's definition, it prescribes standards for comparison,

and research into units of measurements. That is what he meant by the phrase,

"...according to scientific laws."

`Stevens went on to develop his theory of measurement scales, and it is well known `

to all students of psychological research methods. He asserted, correctly, that

different types of measurement scales (nominal, ordinal, interval, and ratio scales)

are derived from different measurement operations that we use to produce them.

This is of utmost importance to psychological science because, depending upon

the type of measurement scale a researcher is using, different decisions must be made

about how to analyze the data.

`Stevens thought he was developing a theory of measurement. He thought he could `

produce a definition of measurement that was based upon the operations required to

produce the measurements. That is why he substituted the phrase, "...according to

rules," in his own definition. He was greatly influenced by the concept of

operationalism in the work of fellow Harvard faculty member, Percy Bridgman, a Nobel

Laureate in Physics. A close examination of his paper, however, shows that his theory

of measurement is really a self-contained, mathematical description of the properties of

different numerical scales, not a theory nor definition of measurement. He constrained

himself to the confines of the internal mathematics involved, and never ventured to

examine the relationship of a fundamental or derived measure to a standard. He was

stuck on the fact that the differing operations [different 'rules'] that were applied, would

impute different properties to the assigned numerals. He was correct, as far as it went,

but scientific psychology still does not know what measurement is.

`In the end, S. S. Stevens' discarding of the Ferguson Committee's final report, his `

substitution of a description of measurement scales for a theory of measurement,

abandoning comparison to a standard in favor of an operationalism-only description

of measurement, and the dismissal of the counsel of the best minds in the science,

philosophy and mathematics of measurement, all of this adds up to the greatest

scientific side-step in the history of psychology.

C. It Doesn't Have To Be This Way

`It should be noted that the understanding of measurement as a comparison to a `

standard was discussed by the reknowned psychologist, L. L. Thurstone, in 1929.

He was trying to conceptualize the subjective responses from a person on a perceptual

scale as resulting from a process of comparison. The specific technique he was using

is referred to as pairwise comparisons. This technique would imbue such data with

properties that made them amenable to mathematical representation and argument.

`“A very important consideration in rendering objective a value-increment that is `

essentially subjective is that as long as each of these psychological values is

considered separately, it can never register objectively. Every scientific

observation is, in fact, a comparative situation and neither of the terms of a

percept can be objectively recorded in isolation. This is true of cognitive as well

as of affective comparison.”

His views were encapsulated into his “Law of Comparative Judgment.” Unfortunately,

the idea of measurement as comparison to a standard did not find its way into

psychology nor into the "Standards For Psychological and Educational Testing," 1999.

## Comments