DEGREE OF FREEDOM IN STATISTICS




In statistics, degrees of freedom (DF) is a difficult concept to explain. However, it is an important idea that appears in many different contexts throughout statistics including hypothesis tests, probability distributions, and regression analysis.  Learn more about degrees of freedom in this blog post!

I’ll start by defining degrees of freedom. However, I’ll quickly move on to practical examples in a variety of contexts because they make this concept easier to understand.

Definition of Degrees of Freedom

Degrees of freedom are the number of independent values that a statistical analysis can estimate. You can also think of it as the number of values that are free to vary as you estimate parameters. I know, it’s starting to sound a bit murky!

Degrees of freedom encompasses the notion that the amount of independent information you have limits the number of parameters that you can estimate. Typically, the degrees of freedom equal your sample size minus the number of parameters you need to calculate during an analysis. It is usually a positive whole number.

Degrees of freedom is a combination of how much data you have and how many parameters you need to estimate. It indicates how much independent information goes into a parameter estimate. In this vein, it’s easy to see that you want a lot of information to go into parameter estimates to obtain more precise estimates and more powerful hypothesis tests. So, you want many degrees of freedom!

Independent Information and Restrictions on Values

The definitions talk about independent information. You might think this refers to the sample size, but it’s a little more complicated than that. To understand why, we need talk about the freedom to vary. The best way to illustrate this concept is with an example.

Suppose we collect the random sample of observations shown below. Now, imagine that we know the mean, but we don’t know the value of an observation—the X in the table below.

Value -6 8 5 9 6 8 4 11 7 X

The mean is 6.9, and it is based on 10 values. So, we know that the values must sum to 69 based on the equation for the mean.

Using simple algebra (64 + X = 69), we know that X must equal 5.

 Estimating Parameters Imposes Constraints on the Data


As you can see, that last number has no freedom to vary. It is not an independent piece of information because it cannot be any other value. Estimating the parameter, the mean in this case, imposes a constraint on the freedom to vary. The last value and the mean are entirely dependent on each other. Consequently, after estimating the mean, we have only 9 independent pieces of information even though our sample size is 10.

That’s the basic idea for degrees of freedom in statistics. In a general sense, DF are the number of observations in a sample that are free to vary while estimating statistical parameters. You can also think of it as the amount of independent data that you can use to estimate a parameter.

Comments