What is Inferential Statistics and confidence level?


Inferential Statistics deals with estimating population parameter based on sample statistics i.e. inferring about the population based on the sample. On the basis of Sample mean(x-bar) we estimate Population mean (µ) and on the basis of sample standard deviation (S) we estimate Population standard deviation (δ).


Thus with this sample we can do two types of estimation- One is Point Estimation and other is Interval Estimation. As per Point Estimation we are saying that my population mean is equivalent to my sample mean. Whereas as per Interval Estimation we use sample data to calculate an interval of possible (or probable) values of an unknown population parameter.


Thus which estimation is more reliable?


Definitely Interval estimation is more reliable than Point estimation because point estimation is telling us that sample mean is equivalent to population mean which is wrong. Population mean is constant whereas sample mean is a variable and it will vary with every other sample. In interval estimation we are framing an interval within which my population mean will fall and we will be able to say with a reliable estimation that my population mean will fall in that interval.  


Example, I want to know the average global income of FRM charter holders. I took sample of 20 FRM students and find out their average income to be say 9 Lac p.a. Now in interval estimation we will frame an interval around this mean say 7-11 Lac i.e. (9-2) and (9+2). This interval is constructed with the formula –





Confidence Interval = Sample Mean ± Mean Error


Where Mean Error = Z*δ/S.E.
           Z = z-value
           δ = Population Standard Deviation
           S.E. = Standard Error




We are having a population data from which we take many sample data. Suppose a set of 20 each. Each will have a sample mean. Now I will be working with a single sample mean say X1. I will construct my interval around this mean X1. For that I will need z value assuming that my distribution is normal distribution. Hence we need standard deviation of these sample means. But it is not possible to get all standard deviation of all sample means. To measure spread or dispersion, in our distribution of sample means, we calculate Standard Error of Mean.


In statistics, a sample mean deviates from the actual mean of the population; this deviation is the standard error. In a simpler way, it means how far sample means away from each other are. If standard error is very low it is good as my sample means are close to my population mean i.e. E(X-bar) = µ.




Thus my confidence interval is constructed with the above formula and intuitively we say that my population mean will fall win that interval but in statistics we need to say it with a level of confidence. In statistics, the probability that we associate with an interval estimate is called the confidence level. This probability indicates how confident we are that the interval estimate will include the population parameter. Since a sample is a small subset of the larger population (or sampling frame), the inferences are necessarily error prone. That is, we cannot say with 100% confidence that the characteristics of the sample accurately reflect the characteristics of the larger population (or sampling frame) too. Hence, only qualified inferences can be made within a degree of certainty, which is often expressed in terms of probability (e.g. 90% probability that the sample reflects the population). In estimation, the most commonly used confidence levels are 90 percent, 95 percent and 99 percent.


Interpretation at 95% Confidence level: It means that if the same procedure of interval construction is used on many different random samples about 95 percent of the resulting confidence intervals will include the population mean and 5 percent won’t.           


Value of Z at different confidence intervals:


Confidence Level
Z-Value
90%
1.645
95%
1.960
99%
2.576





We measure the heights of 40 randomly chosen men, and get a mean height of 175cm with a standard deviation of 20cm. Thus in case when population standard deviation (δ) is not available, then we should use sample standard deviation instead. Thus showing different values of interval at different levels of Confidence:


Confidence Level
Confidence Interval (In Cm)
90%
169.67-180.33
95%
168.60-181.40
99%
166.44-186.56


Thus we can get different confidence intervals at different levels of confidence.


Thus confidence intervals are a concept that everyone learns in their first stats course but I suspect few truly appreciate their importance. Confidence intervals are about risk. They consider the sample size and the potential variation in the population and give us an estimate of the range in which the real answer lies. Confidence intervals are a bright yellow caution sign telling you to take that sample result with a grain of salt because you can’t be more specific than this range.


Author - Kunal Patel

Visit me at - Kunal Patel






Comments