This Pearson correlation calculator helps you determine Pearson's **r** for any given two variable dataset. Below, we explain what Pearson correlation is, give you the mathematical formula, and teach how to calculate the Pearson correlation by hand. You can also discover the link between Pearson's **r** and linear regression, as well as finally understanding what that common saying, "correlation does not equal causation", means.

Interested in other correlation coefficients? Visit Omni's Spearman's rank correlation calculator!

## What is the Pearson correlation coefficient?

The Pearson correlation measures **the strength and direction of the linear relation** between two random variables, or bivariate data. Linearity means that one variable changes by the same amount whenever the other variable changes by 1 unit, no matter whether it changes e.g., from $1$1 to $2$2, or from $11$11 to $12$12.

A simple real-life example is the relationship between parent's height and their offspring's height - the taller people are, the taller their children tend to be.

The Pearson correlation coefficient is most often denoted by **r** (and so this coefficient is also referred to as **Pearson's r**).

## Interpretation of the Pearson correlation

The

**sign**of the Pearson correlation gives the direction of the relationship:- If
**r is positive**, it means that as one variable increases, the other tends to increase as well; and - If
**r is negative**, then one variable tends to decrease as the other increases.

- If
The

**absolute value**gives the strength of the relationship:- Pearson's
**r**ranges from $-1$−1 to $+1$+1; **The closer it is to $\pm 1$±1, the stronger the relationship**between the variables;- If
**r**equals $-1$−1 or $+1$+1, then the linear fit is perfect: all data points lie on one line; and - If
**r**equal $0$0, it means that no linear relationship is present in the data.

- Pearson's

**Remember that Pearson correlation detects only a linear relationship!** For coefficients that can detect other types of relationship, see our correlation calculator.

This means that a low (or even null) correlation doesn't mean that there is no relationship at all! Take a look at the eight data sets below: they all have a Pearson correlation coefficient equal to zero.

## How to use this Pearson correlation calculator

Just input your data into the rows. When at least three points (both an x and y coordinate) are in place, our Pearson correlation calculator will give you your result, along with an interpretation.

The verbal description of the strength of correlation returned in this calculator employs Evan's scale (1996) for the absolute value of **r**:

$0.8 \le |r| \le 1.0$0.8≤∣r∣≤1.0

*very strong*$0.6 \le |r| \lt 0.8$0.6≤∣r∣<0.8

*strong*$0.4 \le |r| \lt 0.6$0.4≤∣r∣<0.6

*moderate*$0.2 \le |r| \lt 0.4$0.2≤∣r∣<0.4

*weak*$0.0 \le |r| \lt 0.2$0.0≤∣r∣<0.2

*very weak*

You may encounter many other guidelines for the interpretation of the Pearson correlation coefficient. Bear in mind that all such descriptions and interpretations are arbitrary and depend on context.

## Pearson correlation formula and properties

It is high time we gave the mathematical formula for the Pearson correlation. Formally, Pearson's r is defined as the **covariance of two variables divided by the product of their respective standard deviations**. This translates into the following formula:

$\smallr_{xy} \! = \! \frac{\sum_{i=1}^n (x_i - \bar x) (y_i - \bar y)}{\!\! \sqrt{\sum_{i=1}^n \! (x_i \! - \! \bar x)^2} \! \sqrt{\sum_{i=1}^n \! (y_i \! - \! \bar y)^2}}$rxy=∑i=1n(xi−xˉ)2∑i=1n(yi−yˉ)2∑i=1n(xi−xˉ)(yi−yˉ)

which can be further rewritten as:

$\smallr_{xy} = \frac{\sum x_i y_i - n \bar x \bar y}{\sqrt{\sum x_i ^2 - n \bar x^2} \sqrt{\sum y_i ^2 - n \bar y^2}}$rxy=∑xi2−nxˉ2∑yi2−nyˉ2∑xiyi−nxˉyˉ

It can be proven (via the Cauchy–Schwarz inequality) that the absolute value of the correlation coefficient never exceeds $1$1.

Note that the

**correlation is symmetric**, i.e., the correlation between $X$X and $Y$Y is the same as between $Y$Y and $X$X.**Correlation vs. independence**. If the variables are independent, their correlation is $0$0, but, in general, the converse is not true! There is, however, a special case: when $X$X and $Y$Y are**jointly normal**(i.e., the random vector $(X, Y)$(X,Y) follows a bivariate normal distribution) and uncorrelated, then independence follows.

Since we have mentioned covariance, you can visit the covariance calculator for more insights regarding this statistical quantity.

## How to calculate Pearson correlation by hand

In case you wanted to better understand how the Pearson correlation formula works, we have prepared a way for you to compute Pearson's **r** by hand. Suppose we have the data set:

$(1, 1), (3, 2), (3, 3), (5, 4)$(1,1),(3,2),(3,3),(5,4),

so the x-values are $1, 3, 3, 5$1,3,3,5, and the respective y-values are $1, 2, 3, 4$1,2,3,4.

- Count how many points there are: $4$4
- Calculate the mean (arithmetic average) of the $x$x and $y$y values with our average calculator or manually:

$\begin{split}\bar x =& (1 + 3 + 3 + 5)/4 = \\[0.5em]&12 / 4 = 3\end{split}$xˉ=(1+3+3+5)/4=12/4=3

$\begin{split}\bar y =& (1 + 2 + 3 + 4)/4 = \\[0.5em] &10 / 4 = 2.5\end{split}$yˉ=(1+2+3+4)/4=10/4=2.5

- Calculate the sums of the squares of $x$x and $y$y, and their dot-products:

$\sum x_i^2 = 1^2 + 3^2 + 3^2 + 5^2 = 44$∑xi2=12+32+32+52=44

$\sum y_i^2 = 1^2 + 2^2 + 3^2 + 4^2 = 30$∑yi2=12+22+32+42=30

$\begin{split}\sum x_i y_i &= 1 \times 1 + 3 \times 2 \\[0.5em]&+ 3 \times 3 + 5 \times 4 = 36\end{split}$∑xiyi=1×1+3×2+3×3+5×4=36

- We have all the values needed to apply the formula:

$\smallr_{xy} = \frac{\sum x_i y_i - n \bar x \bar y}{\sqrt{\sum x_i ^2 - n \bar x^2} \sqrt{\sum y_i ^2 - n \bar y^2}}$rxy=∑xi2−nxˉ2∑yi2−nyˉ2∑xiyi−nxˉyˉ

$\begin{split}\mathrm{numerator} = & \sum x_i y_i - n \bar x \bar y = \\[0.5em]& 36 \! - \! 4 \! \times \! 3 \! \times 2.5 \! = \! 6 \\[0.5em]\end{split}$numerator=∑xiyi−nxˉyˉ=36−4×3×2.5=6

$\begin{split}\mathrm{denominator} =& \sqrt 8 \times \sqrt 5 = \\[0.5em]&\sqrt 40 \approx 6.32\end{split}$denominator=8×5=40≈6.32

because

$\sum x_i ^2 - n \bar x^2 \\[0.5em]\quad = 44 - 4 \times 3^2 = 8$∑xi2−nxˉ2=44−4×32=8

and

$\sum y_i ^2 - n \bar y^2 \\[0.5em]\quad = 30 - 4 \times 2.5^2 = 5$∑yi2−nyˉ2=30−4×2.52=5

- Finally, we can compute the value of the Pearson correlation coefficient:

$r = \frac{6}{6.32} \approx 0.95$r=6.326≈0.95

## Pearson's r and R-squared in simple linear regression

In simple linear regression ($Y \sim aX + b$Y∼aX+b), the Pearson correlation is directly linked to the **coefficient of determination** (R-squared), which expresses the fraction of the variance in $Y$Y that is explained by $X$X:

The

**R-squared**can be calculated by simply**squaring the Pearson correlation coefficient**.The

**slope**$a$a**of the fitted regression line**can be found, as the Pearson correlation between $Y$Y and $X$X multiplied by the ratio of their respective standard deviations gives the gradient: $a = r (s_y / s_x)$a=r(sy/sx).

If you want to perform linear regression on your data, check the least squares regression line calculator to find the best fit of $a$a and $b$b parameters.

## "Correlation does not equal causation"

Always remember that even a very strong **correlation between two variables does not mean there's a causal link between the variables**. It could be random chance, or there may be some other intervening variable that affects both your variables.

For example, the demand for sunglasses is strongly positively correlated with the rate of people drowning. This does not mean that sunglasses force anybody underwater! Instead, we rather suspect that hot weather causes both of these variables to increase.

Click here to read about other mind-blowing examples of crazy correlations.