Hypothesis Testing Using Simulation in R and Python

6 min readNov 17, 2023

Hypothesis testing is a fundamental statistical concept applied across various disciplines. It provides a structured and scientific approach to making educated guesses and drawing conclusions from data. Researchers and academics in quantitative fields are well-acquainted with the principles of hypothesis testing.

Traditionally, hypothesis tests are conducted using methods such as Z-tests or t-tests for continuous data, or by calculating binomial probabilities for discrete data. In this blog post, I want to introduce you to an alternative approach: hypothesis testing through simulation for continuous data.

Simulating hypothesis tests can offer several advantages. It not only provides a visual representation of results, which can enhance understanding for many, but it also has the flexibility to be applied to data with various distributions. This flexibility contrasts with traditional Z-tests, which assume a normal distribution — a topic I’ll delve into in a future blog post.

In this article, I’ll illustrate a simple means of comparison for hypothesis testing using simulation. However, to validate our findings, I’ll also perform a Z-test later in the post.

For this article, I took an example from A-Levels Statistics 2 book. I tweaked the values a little. The example is as follows —

“The blood pressure of a group of hospital patients with a certain type of heart disease has mean 85.6. A random sample of 25 of these patients volunteered to be treated with a new drug and a week later their mean blood pressure was found to be 60. Assuming a normal distribution with standard deviation 15.5 for blood pressures, and using a 5% significance level, test whether the mean blood pressure for all patients treated with the new drug is less than 85.6.”

As you can see, clearly this is a one-tailed test. The objective is to determine whether the drug treatment has successfully lowered blood pressure in all treated patients. To be precise, the sample mean stands at 78.5. The question at hand is whether this sample mean provides sufficient evidence to conclude that the treatment has had a blood pressure-lowering effect.

Comparing sample means to the population mean is one of the most commonly employed methods in hypothesis testing. To carry out this testing, you must have both a sample and its corresponding mean.

We may note the hypothesis like this:

Ho: The blood pressure of the patients with certain heart diseases is 85.6 after the drug.

H1: The blood pressure of patients with certain heart diseases is less than 85.6 after the drug.

Now, let’s move ahead with the simulation.

The very first thing we notice is that the distribution is assumed to be normally distributed, and the prior mean and standard deviation are provided. With this information, we can build a prox of a population. So the first thing I do is generate a population distribution with the provided information:

First I plug in all the information given. Then I generate a population of size 10,000 with random blood pressure that follows the normal distribution.

#R Code
N <- 100000
u <- 85.6 #population mean
sd <- 15.5  #population standard deviation
sample <-  25
#se <- sd/ sqrt(sample)  #we won't use this for today's example
X <- 60  #sample mean
sig <-  0.05

pop <- rnorm(N,u,sd)
# pop <- rnorm(N,u,se) again ignoring se to not to go deeper into this

#Python Code
import numpy as np
N = 100000
u = 85.6 #population mean
sd =15.5  #population standard deviation
sample = 25
#se = sd/ np.sqrt(sample)   #we won't use this for today's example
X=60  #sample mean
sig = 0.05

pop = np.random.normal(u,sd,N)
#pop = np.random.normal(u,se,N) again ignoring se to not to go deeper into this

(Note, in reality whenever we take a sample, we must calculate standard error in place of standard deviation. A standard error is the standard deviation divided by the root of the sample. However, I have ignored that here for the sake of simplicity. But you should not in practice)

What I created is the population variable. Consider it as the blood pressure record of 10,000 people (with heart diseases, as given in the problem). This has nothing to do with drugs given or not. (In a more advanced model, we must use a Randomized Control Trial, but for now, this will suffice).

Let me show you how the distribution looks like.

The distribution appears to be relatively normal, with a noticeable center around 86. Moving forward, we have data from a sample of 25 patients who received the drug, and their mean blood pressure is reported as 60. Now, if the drug indeed had a significant effect, the sample mean should fall in a region of the graph where it’s unlikely to occur solely by random chance.

Just by looking at the graph, does it seem probable that the drugs have significantly lowered blood pressure? How likely is it for us to observe a value as low as 60, or even lower, in this context? At first glance, it doesn’t appear to be a frequent occurrence, does it?

So I run the code and check what proportion of the population means is under 60.

#R Code
mean(pop <= 60)  #this automatically calculates the proprotion of values less than 60
#0.04995

#Python Code
np.mean(pop <=60)

The test returned an incredible value of 0.04995. Why is it incredible? It’s just shy of 0.05, the rejection level.

In the 10,000 simulated population, only 4.9% of data is under 60. At 5% significance level, this can be rejected. Thus, we can reject the null and accept the fact that the drug was effective.

This is how you can get the p_value from the simulation. No Z-score or Z tables, no mathematics, just a simple chunk of code. Moreover, it provides you with the same answer that you would have gotten if you had used the Z-score methods. I will verify this next with the Z-test.

Compared with the Z-test

Let me run the Z-test. Z-score can be calculated by subtracting the sample mean from the prior or population mean and dividing the value by the standard deviation. (again, skipping the standard error)

Let me calculate the Z-score right away. Then, using the functions, I also calculate the p-values associated with the Z-score.

#R Code
Z <- (X-u) / sd
Z
#prints -1.65161290322581
pnorm(Z)
#prints 0.049306743964906

#Python Code
from scipy.stats import norm

Z = (X-u)/sd
print(Z)
#Z = -1.65161290322581
print(norm.cdf(Z))

I obtained a Z-score of 0.0493. How close is this result? Remarkably close, isn’t it? Interestingly, both methods yield the same score. While Z-scores and p-values might appear more straightforward for those experienced in statistics, they can be somewhat cryptic for those less familiar with the subject. In contrast, the simulation approach maintains its intuitiveness throughout the process, aided by its visual representation, making it much more accessible to comprehend.

So, there you have it. I’ve demonstrated how the hypothesis testing you learned in high school can be effortlessly accomplished using statistical tools. Perhaps in my next article, I’ll present a case where simulation becomes a crucial method.

Using the Standard Error

I deliberately omitted discussing the standard error earlier to avoid delving too deeply into mathematical details. Introducing the standard error would alter the distribution’s shape significantly. Nevertheless, the key takeaway here is that whether I conduct a simulation or employ Z-score testing, the resulting probability values will exhibit a remarkable degree of similarity. This consistency holds even when I substitute the standard error for the standard deviation. However, the final result- the p-value is likely to change in both tests, but should again be very close to each other.

Hypothesis Testing Using Simulation in R and Python

Compared with the Z-test

Using the Standard Error

Written by Alabhya Dahal

No responses yet