Business, Education

How to find a correlation coefficient

How to find a correlation coefficient 1

Image via: Pexels

Outlines how to find a correlation coefficient, with methods by hand, in Excel and Open Office, and how to interpret the correlation coefficient. This is used widely in research, and is a strong factor in accurate business research.

Find more education guides, tips and advice

Correlation is a measure of the statistical relationship between two data sets. The strength of the relationship can be determined using the correlation coefficient.

Correlation analysis is about the relationship between the variables. The correlation coefficient is used to determine the strength of the relationship. This coefficient is usually referred to as r and can have values ​​between -1 and +1. 

  • up to 0.2 => very low correlation
  • up to 0.5 => low correlation
  • up to 0.7 => medium correlation
  • up to 0.9 => high correlation
  • over 0.9 => very high correlation

With a correlation coefficient of 1 there is a perfect positive linear relationship between the variables and with 0 there is no linear relationship at all (but possibly a nonlinear one). With a correlation coefficient of -1, there is a perfect negative linear relationship.

The product-moment correlation coefficient

There are different types of correlation coefficients. One of the most commonly used is the product-moment correlation coefficient (or Pearson correlation). It is used with metric scaling when a linear relationship between the variables is to be examined. A weakness of the product-moment correlation coefficient is that it is not robust against outliers. For ordinally scaled variables, one would use a different method, such as Spearman’s rank correlation coefficient.

Interpretation of the correlation coefficient

It is important to differentiate between correlation and causality: Just because two variables are correlated does not mean that one variable has a direct influence on the other. The statistical relationship could also be caused by a third variable that was not taken into account in the survey. Correlation is therefore an indication of causality — but not proof. 

Example 1

Water utility operators might associate water consumption with advertising major television events. You would find that water is used more during commercial breaks, as viewers frequently use the toilet at these times. In this example there is indeed a causal relationship. However, the correlation is not sufficient to actually prove this connection. Correlation does not imply a causal relationship.

Example 2

In the international PISA education study, the statistical examination of the connection between reading performance and cultural activities showed a significant correlation of r = 0.23 within the countries. 

From the results of the multi-level analyzes, however, it can be assumed that both variables, cultural activities and reading skills, are very much confused with socio-economic status (including the educational level of the parents). 

This example has a low correlation between reading performance and cultural activities that are significant, but of little practical significance. Studies with a large number of cases often lead to highly significant results due to their high statistical numbers.

Pearson Product-Moment Correlation

When we talk about correlation, we mostly talk about the Pearson product-moment correlation (also called Bravais-Pearson correlation, Pearson correlation, or simply correlation coefficient ). It is mostly abbreviated by the Greek letter ρ (rho), even if the letter r is mostly used, especially in scientific publications . It is also the basis of many other correlation coefficients.

Pearson’s correlation coefficient is closely related to covariance. The covariance corresponds to the correlation if the variables were z -standardized beforehand . Mathematically, this is also achieved by dividing the covariance by the product of the standard deviations of both random variables.

How to find a correlation coefficient 1
  • X and Y are two measurement series or random variables
  • cov ( X , Y ) is the covariance of X and Y
  • σ is the standard deviation (since and 1/n and 1/n-1 cancel out from the fraction, it makes no difference whether you use the sample or population standard deviation)
  • E ( x ) is the expected value of the variable x

The correlation coefficient can have values ​​between -1 and 1, whereby a correlation coefficient of 0 means that there is no connection between the two variables. A correlation coefficient of +1 describes a perfect positive relationship between the two variables, while a correlation of -1 describes a perfect negative (inverse) relationship ( anti- correlation ). 

The correlation coefficient always describes a linear relationship. If the relationship between the two variables is not linear, the relationship as described by ρ may not correspond to the actual relationship. If both variables are stochastically independent, then the correlation coefficient assumes a value of 0 – the opposite, however, is not true. Let us assume, for example, that the random variable X has a standard normal distribution  and that Y = X ², that is, depends entirely on Thus there would be a complete dependency between the two variables, their correlation would be zero according to Pearson. However, in the special case that X andY are bivariate normally distributed, that independence equals a zero correlation.

Correlation is also closely related to regression. If both variables are standardized, the correlation coefficient corresponds to the slope of the regression line with a y- axis intercept of 0.

How to find correlation coefficient methods

There are several ways to to find the correlation coefficient. The most common are by hand using a formula, within Excel or Open Office, and using other similar software packages.

How to find a correlation coefficient by hand

The formula for the correlation (r) is

How to find a correlation coefficient

where n is the number of pairs of data;

are the sample means of all the x-values and all the y-values, respectively; and sx and sy are the sample standard deviations of all the x- and y-values, respectively.

How to find the correlation coefficient in Excel

Correlations can be calculated with Excel or OpenOffice:

  • DETERMINATION (Y_values; X_values) Returns the square of the Pearsonian correlation coefficient.
  • CORREL (Y_values; X_values) Returns the correlation coefficient of two series with OpenOffice

Calculate regression line with Excel and Open Office

SLOPE (Y_values; X_values) and AXIS SECTION (Y_values; X_values) with Excel this is also more convenient by creating an xy diagram and displaying the regression line and requesting an equation.