The basic idea is that instead of literally repeating the data acquisition process over and over again, we can simulate those repeated measurements using python. Here in this post, we will see how you can compute Bootstrap replicate from a bootstrap sample using Python
Statistical Inference
It is defined as a process by which we go from measured data to probabilistic conclusions about what we might expect if we collected the same data again. The resampled array of the data that we generated from our original data can be called as Bootstrap sample.
Basically, Bootstrapping is a term used for the use of resampled data to perform statistical inference.
For example, we can say that if we have a dataset(original) with l number of repeated measurements, a bootstrap sample in an array of same length l that was drawn from an original dataset with replacement. A bootstrap replicate is a single value of statistics computed from a bootstrap sample.
Let’s look at how we can generate a bootstrap sample and compute a bootstrap replicate from it using Python. Here the resampling engine that we are going to use to generate bootstrap sample is numpy.random.choice():
First, let’s see how it works by using a numpy array and later we apply it on a dataset.
import numpy as np array = np.array([1,2,3,4,5,6,7,8,9]) sample = np.random.choice(array, size=len(array)) #In above line, we have passed an array as the first parameter from which we have to generate a bootstrap sample, and in second parameter we have passed size which allows us to specify how many samples we want to take out from that array. [code lang="python"]print(sample) Output - [8 7 1 7 8 9 9 6 4]
The above output is the bootstrap sample that we calculated from an array. To compute Bootstrap replicate just pass the bootstrap sample to the statistical function like np.mean().
bootstrap_replicate = np.mean(sample) print(bootstrap_replicate) 3.888888888888889
To know more about how to calculate more statistics about the data, check the post.
If we have to take the value, again and again, its always better to write a function that will simplify our process. Here’s below a simple function for one-dimensional data to compute bootstrap replicate from the bootstrap sample using python. You can write your own function as well:
def bootstrap_replicate(data, func_to_cal): """ Generate bootstrap sample and bootstrap replicate""" bootstrap_sample = np.random.choice(data, len(data)) return bootstrap_sample, func_to_cal(bootstrap_sample) bootstrap_replicate(array, np.mean) (array([7, 6, 4, 7, 1, 8, 6, 7, 9]), 6.111111111111111)
Let’s take another thing that what if you want to compute 100 or 200 or 500 bootstrap replicates from the bootstrap sample. You can do so by using a for-loop. You can write function again just to return the bootstrap replicate.
def bootstrap_replicate(data, func_to_cal): """ Generate bootstrap replicate""" bootstrap_sample = np.random.choice(data, len(data)) return func_to_cal(bootstrap_sample) bootstrap_replicates = [] for i in range(200): bs_rep = bootstrap_replicate(array, np.mean) bootstrap_replicates.append(bs_rep) print(bootstrap_replicates)
Plotting a histogram of bootstrap replicate
plot = plt.hist(bootstrap_replicates, bins=40, color='blue', density=True) plt.show()
Also if you want to calculate the confidence interval of statistics which is defined as the process in which if we repeat measurements over and over again, x% of observed values would lie within x% confidence intervals. To calculate Bootstrap confidence interval we will use np.percentile():
confidence_int = np.percentile(bootstrap_replicates, [2.5,97.5]) print(confidence_int) Output - [3.11111111 6.55833333]
If we come to an advantage of bootstrap, its the simplicity and also it provides us with a way to derive estimates of standard errors and confidence intervals for complex estimators of complex parameters of the distribution, such as percentile points, proportions, odds ratio, and correlation coefficients. Although the disadvantage is that bootstrapping under some conditions is asymptotically consistent, as it does not provide general finite-sample guarantees.
There are always different ways by which you can compute things that we have seen. Here in this post, I took a simple and precise way and tried to explain to you how you can compute Bootstrap Replicate from Bootstrap Sample using Python.
For more detailed information you can check out NumPy Documentation.
Cealo says
Thank you for sharing this amazing post