Bootstrapping & Sampling with Replacement

https://pixabay.com/en/users/michelmondadori-2770862/

In reality, surveying an entire population typically cannot be done. In the case of surveying graduates to determine their starting salaries, privacy laws would prohibit colleges and universities from supplying researchers with graduates’ contact information. Even if populations can be surveyed, the costs associated with doing so often would be prohibitive. We get our best guesses about characteristics of a population from using a sample randomly selected from the population.

Because we do not anticipate that the sample will match the population exactly, we estimate population characteristics using intervals of values (interval estimates) rather than individual values (point estimates). One method for constructing interval estimates is known as the bootstrapping method. The method’s name comes from the saying to “pull yourself up by your bootstraps” (Cleophas, Zwinderman, Cleophas, & Cleophas, 2009), which refers to using one’s own efforts to get out of a difficult or impossible situation—to make the seemingly impossible become possible. In the case of statistics, the bootstrap method allows us to make estimates for the population through brute force—no formulas necessary!

We are interested in estimating the actual mean starting salary for 2014 petroleum engineering graduates. We could select additional samples of engineers and calculate their mean starting salaries to form an interval estimate for the population mean. However, because sampling from the population can be expensive, we instead use our best estimate for the population—the sample—and use it as if it were the population. We select samples, called bootstrap samples, using the data from our sample, a process called resampling. Because there are a finite number of values in our sample, we use sampling with replacement, meaning that after being selected, each salary is recorded and returned to the collection before the next salary is selected.