Difference Between Two Sample Proportions

Difference Between Two Sample Proportions

Consider an infinite (or very large) population, where each observation has a probability pX of being a success, and a probability (1-pX) of being a failure. Let the set of independent and identically distributed random variables X1, X2, ..., Xm represent the observations from a sample of size m, where

Xi = 1 if the ith observation is a success
= 0 if the ith observation is a failure

Consider another similar population, with each observation having a probability pY of being a success, and a probability (1-pY) of being a failure. Let the set of independent and identically distributed random variables Y1, Y2, ..., Yn represent the observations from a sample of size n, where

Yi = 1 if the ith observation is a success
= 0 if the ith observation is a failure

Let X = (X1 + X2 + ... + Xm) and Y = (Y1 + Y2 + ... + Yn). Then X is binomially distributed with parameters pX and m; similarly, Y is binomially distributed with parameters pY and n. Using the normal approximation to the binomial distribution,

X ® N(mpX, mpX (1-pX)) as m ® ¥
Y
® N(npY, npY (1-pY)) as n ® ¥

Therefore,

8st014

As the sample proportions (X/m) and (Y/n) are both normally distributed when m and n are large, the difference (X/m) - (Y/n) is also normally distributed. In fact,

8st015

and thus

8st016