Most of the time we are working with samples and not populations since it is usually too time consuming and expensive to collect population data. A sample is a subset of the population data and if the sample is taken at random it can be fairly assumed that the sample is representative of the population. Sometimes we need to make a comparison of the means of the population from which these two samples were drawn from. For example, let’s say we want to compare the average height of all men and women in the country. We draw a random sample of all men and a random sample of all the women from that country. We can calculate the sample average for men and women and compare them. However, this may not help us answer the question about the population since the next time you draw a random sample you will most likely get totally different numbers. So, the question now is what conclusions can we draw for the average height of men and women for the entire population? In order to answer these types of questions, we need to use hypothesis testing. Hypothesis testing belongs to a class of statistical techniques called inferential statistics because we are inferring about the population from the sample data set.
When to use hypothesis testing?
Whenever we need to make decisions based on data and we are working with sample data rather than population data, we need to use hypothesis testing. Hypothesis testing considers the variation that exists in the data set before extrapolating and drawing conclusions about the population. Of course, we can always make a mistake when we extrapolate from a sample to a population but using hypothesis testing it is possible to minimize the amount of error that may occur. In one of our subsequent modules, we will cover how to control the errors during hypothesis testing. If you don’t use hypothesis testing and only draw conclusions based on the sample results, you may draw the wrong conclusions.
The following methodology needs to be used anytime you want to perform hypothesis testing:
What is the practical question you are trying to answer?
Convert the practical question to a statistical question (formulate your hypothesis statements)
Determine the hypothesis test that needs to be used
Determine the sample size required to control your alpha and beta errors
Collect a random sample of the required data
Perform the test and get the confidence intervals and P-values
Draw statistical conclusions
Convert the statistical conclusions to a practical conclusion to obtain an answer to the question
In this article, we will discuss how to formulate the hypothesis statements. In general, there are 2 hypothesis statements. One of them is called the null hypothesis (Ho) and the other is called alternative hypothesis (Ha). When we initially formulate these statements, we usually don’t know what the answer will be. The data we collect later in the process will help us select whether we need to conclude Ho or Ha.
There are three rules to consider when you are writing the hypothesis statements:
RULE 1: The first rule is that the hypothesis statements is always about the population parameters and not about the sample. What we mean by this is that we know the exact statistic about the sample data we collect and hence we don’t need to make any hypothesis about them. What we do not know for a fact are the conclusions about the population. Hence, we always make hypothesis about the population. For example, if we are interested in the average height of a person in a country (say Brazil) and our hypothesis is that the average is equal to 6 feet. We can represent the null hypothesis as follows:
Note that we use the Greek letter (mu) to represent the population average. If we are working with a sample of say 20 data points, we use the English letter (xbar) to represent the sample statistic. The wrong way to make the hypothesis statement is the following:
RULE 2: The second rule is that equality sign belongs to null hypothesis. There are two hypothesis we create each time, the null (Ho) and alternative (Ha). Only one of them can be true at any given time. If Ho is accepted as true then Ha would be false and vice-versa. When you are creating the hypothesis statements, you need to ensure that the equality sign is always allocated to the null hypothesis. For example, if we want to show the average height is 6 feet, then one possibility for the hypothesis statements is as follows (average height = 6 feet and average height is not equal to 6 feet). This can be written mathematically as follows:
The wrong way to write these hypothesis statements are:
Of course, there are more possibilities of writing these statements depending on what you want to prove or disprove. For example, if you want to show that the average height of a person in Brazil is less than six feet then the corresponding hypothesis statements are as follows:
In all these statements, you will find that the equality sign has been allocated to Ho.
RULE 3: The third rule is that we continue to believe in the null hypothesis (Ho) if we don’t have enough facts and data. Only sufficient data can disprove the Ho and we reject the null hypothesis and accept the alternative hypothesis as proven. Hence, usually, what we want to prove or disprove is part of the alternative hypothesis and the null hypothesis contains the status quo – no change. The null hypothesis can be thought as a statement of no difference or zero difference. For example, if we want to prove that providing training improves the average productivity of an organization. The null hypothesis would be that whether we provide training or not the average productivity is the same. The alternative hypothesis may be that training improves productivity. If we don’t have sufficient data, our conclusion would be that we don’t have sufficient data to prove that training has an impact on productivity.
Note that if we are hypothesizing about the average value of a property, we use the Greek letter mu to denote the population average. If we are hypothesizing about the variation of a value, then we usually use the Greek letter, sigma which stands for standard deviation and finally, if we are hypothesizing about discrete values (say number of defects), we usually use the proportions (denoted by p). A proportion of defects or defectives is the number of defects divided by the total sample size. For example, if we have 50 items and out of which we have 4 defects, then the proportion is 4/50 = 0.08.
The following examples help illustrate how to write the right hypothesis statements. Try to write out the hypothesis statements for yourself first and then compare your answers to the ones provided below.
Prove that the average salary in company A for entry level employees is greater than the average salary in company B.
Prove that the average salinity in the sea water is greater than 35 g/L.
Show that the variation in delivery times for supplier A is different from variation in delivery times for supplier B.
Show that the average number of footfalls for four department stores, A, B, C, and D are different.
Show that the manual process of recording exam marks has more defects compared to the automated process for recording exam marks.
Solutions to Examples
Here are some sample solutions to the examples shown above.
Try to come up with the right hypothesis statements for the following problems.
An engineer comes up with a recommendation to improve the productivity of a generator. Data was collected for 10 days with the old way of working and with the modification to the generator. Based on this analysis, we want to prove that the modification in-fact increases the productivity of the generator.
It was hypothesized that employees who work for longer than 10 hours per day make more quality defects compared to employees who work less than 10 hours per day.
We want to show that the breaking strength of a material supplied by the old supplier is significantly lower than the breaking strength of a similar material supplied by a new supplier.
Follow us on LinkedIn to get the latest posts & updates.