Correlation and Casualty

Correlation and Causality

The term causal means that most social research is interested (at some point) in looking at cause-effect relationships. Probably the vast majority of applied social research consists of these descriptive and correlational studies. Because for most social sciences, it is important to go beyond simply looking at the world or looking at relationships.

In actual practice, great numbers of individuals have been followed in this manner, but the random sampling and random group assignment that are fundamentals of good research design obviously had to be omitted. By working with large numbers of subjects and amassing a very large database, researchers think that possible bias in the subject selection process ceases to be a viable explanation of the results. They are probably correct. As much as researchers want the best data possible upon which to base their conclusions, a theory that a causal relationship is at the root of a demonstrated functional relationship is not always directly testable.

Correlation

The measurement of the degree of relationship between variables is called correlation and that between attributes is called association. The correlation only indicates the degree and direction of the relationship between two variables.

It is generally accepted that an r of .10 is weak, while an r of .70 is strong. (However, with large sample size, even a weak correlation could be statistically significant.) Between .10 and .70 is where most correlation coefficients occur in the social sciences. As Simon and Burstein suggest, in a field like economics, where time-series analyses of aggregate measures are examined (such as economic indicators), the correlation is high because the variables move together (Baker, 1994: 394).

The first statistics that you should look at are the correlation between variables that you think may be causally related, either directly or indirectly. By convention, a correlation coefficient close to 1 or -1 means a strong relationship, in contrast to a correlation close to zero, which implies a weak relationship.

Theory tells you which variables' correlation you should look at first. But after you have followed whatever guidance your theory offers, you should then look for unexpected information in the data. To do so, you should seek variables that are strongly correlated with the variable that you wish to explain; the formers are candidates to be independent variables in your subsequent analysis. A strong correlation between a pair of independent variables is a sign that they may be proxies for each other, or are sequentially related in some part of the causal chain.

How high a correlation is "strong"? This depends on the context. There is a joke that the tragedy in educational psychology is that no variables are related at all, whereas the tragedy in economics is that all variables are perfectly related. That is, correlation in educational psychology is often close to zero, because the data is often cross-sectional, including much-uncontrolled variability among individuals, and educational approaches differ little in their effects. It is thus difficult to show that anything has much impact on the dependent variable.

Correlations in economics however are often close to 1 because the data are often aggregate time move together. Here, everything may affect the dependent variable, and the coefficients often provide little help in understanding what is happening. A correlation of 0.6 might seem low to an economist and an amazing high to an educational psychologist. The point is that no general rule can be useful about meaningful sizes of correlation coefficients; for guidance, consult someone who is experienced with similar data.

Causality

A definition of "causality" seems unnecessary in everyday life. No one is in much doubt about whether to say that a football causes a window to break or that yeast causes the cake to rise. And even when we say we do not know whether two events are causally related-for example, whether walking under a ladder causes bad luck or night air causes disease-our doubt seems to be about our knowledge of the world and not about the meaning of the word "cause."

It is useful to speculate on why the term "cause" and the causal concept are not puzzling or vague in everyday life. Much of the explanation is probably that we use causal terms all day long in our common speech-not only the term cause itself, but also synonyms like "influence," "produce," and "create" and related words like "smash," "build," and "fix." This common conversational practice teaches us quite accurately what these terms mean to other people.

Our sure-footed use of the term in everyday speech-as contrasted with our confusion in scientific speech-may also stems from the one-to-one quality of most everyday relationships called "causal." We are not so concerned with whether footballs cause windows to break but rather with whether that football caused that window to break. In our everyday speech, we are not so concerned with general statements about groups of events-for example, whether a rise in price causes fewer people to buy or heat causes riots- as we are in social science.

For the most part, defining causality is not a troubling problem when the scientist can run an experiment. As we shall see, the actual experiment is itself close to a complete operational definition of causality - though it is not a complete definition. It is when the scientist cannot experiment but must deal instead with the data as the world presents them to him or her that an operational definition of "causality" is most acutely needed and most difficult to create.

To create a satisfactory operational definition of "causality" for non-experimental situations, we must, first, trace some philosophic history; second, go further into the nature of definitions to see why we seek an operational definition; and, third, and most importantly, explore the use of the causality concept in the social sciences, in order to find out what a valid operational definitional of causality should be.

To explicate the term "causality" and the concept, for which it stands in social-scientific usage, we must start with the most basic notion in science-the observed association or correlation or relationship, all of which are used as synonyms. To say that there is an association is simply to say that, when A has occurred in the past, B has also occurred more often than would have been expected from the change. Or to put it another way, an association is shown if B has occurred more often when A was present than when A was not present.

This definition of an association does not necessarily exclude historical statements, even though they refer to single occurrences of A and B. There is nothing logically wrong with saying that war with Japan in 1941 occurred more often than it would have if Japan had not bombed Pearl Harbor.