Inferential Statistics
deals with estimating population parameter based on sample statistics i.e.
inferring about the population based on the sample. On the basis of Sample
mean(x-bar) we estimate Population mean (µ) and on the basis of sample standard
deviation (S) we estimate Population standard deviation (δ).
Thus with this sample we
can do two types of estimation- One is Point Estimation and other is Interval
Estimation. As per Point Estimation we are saying that my population mean is
equivalent to my sample mean. Whereas as per Interval Estimation we use sample
data to calculate an interval of possible (or probable) values of an unknown
population parameter.
Thus which estimation is more
reliable?
Definitely Interval
estimation is more reliable than Point estimation because point estimation is
telling us that sample mean is equivalent to population mean which is wrong. Population
mean is constant whereas sample mean is a variable and it will vary with every
other sample. In interval estimation we are framing an interval within which my
population mean will fall and we will be able to say with a reliable estimation
that my population mean will fall in that interval.
Example, I want to know
the average global income of FRM charter holders. I took sample of 20 FRM
students and find out their average income to be say 9 Lac p.a. Now in interval
estimation we will frame an interval around this mean say 7-11 Lac i.e. (9-2)
and (9+2). This interval is constructed with the formula –
Confidence Interval = Sample Mean ± Mean Error
Where Mean Error = Z*δ/S.E.
Where Mean Error = Z*δ/S.E.
Z = z-value
δ = Population Standard Deviation
S.E. = Standard Error
We are having a population data from which we take many sample data. Suppose a set of 20 each. Each will have a sample mean. Now I will be working with a single sample mean say X1. I will construct my interval around this mean X1. For that I will need z value assuming that my distribution is normal distribution. Hence we need standard deviation of these sample means. But it is not possible to get all standard deviation of all sample means. To measure spread or dispersion, in our distribution of sample means, we calculate Standard Error of Mean.
In statistics, a sample
mean deviates from the actual mean of the population; this deviation is the standard
error. In a simpler way, it means how far sample means away from each other are.
If standard error is very low it is good as my sample means are close to my
population mean i.e. E(X-bar) = µ.
Thus my confidence
interval is constructed with the above formula and intuitively we say that my population
mean will fall win that interval but in statistics we need to say it with a
level of confidence. In statistics, the probability that we associate with an
interval estimate is called the confidence level. This probability indicates
how confident we are that the interval estimate will include the population
parameter. Since a sample is a small subset of the larger population (or sampling frame),
the inferences are necessarily error prone. That is, we cannot say with 100%
confidence that the characteristics of the sample accurately reflect the
characteristics of the larger population (or sampling frame) too. Hence, only
qualified inferences can be made within a degree of certainty, which is often
expressed in terms of probability (e.g. 90% probability that the sample
reflects the population). In estimation, the most commonly used confidence levels are 90
percent, 95 percent and 99 percent.
Interpretation at 95%
Confidence level: It means that if the same procedure of interval construction
is used on many different random samples about 95 percent of the resulting
confidence intervals will include the population mean and 5 percent won’t.
Value of Z at different
confidence intervals:
Confidence
Level
|
Z-Value
|
90%
|
1.645
|
95%
|
1.960
|
99%
|
2.576
|
We measure the heights of 40 randomly chosen men, and get a mean height of 175cm with a standard deviation of 20cm. Thus in case when population standard deviation (δ) is not available, then we should use sample standard deviation instead. Thus showing different values of interval at different levels of Confidence:
Confidence
Level
|
Confidence
Interval (In Cm)
|
90%
|
169.67-180.33
|
95%
|
168.60-181.40
|
99%
|
166.44-186.56
|
Thus we can get different
confidence intervals at different levels of confidence.
Thus confidence intervals are
a concept that everyone learns in their first stats course but I suspect few
truly appreciate their importance. Confidence intervals are about risk. They
consider the sample size and the potential variation in the population and give
us an estimate of the range in which the real answer lies. Confidence intervals
are a bright yellow caution sign telling you to take that sample result with a
grain of salt because you can’t be more specific than this range.
Author - Kunal Patel
Visit me at - Kunal Patel
Comments
Post a Comment