21.1 Hypothesis Test Basics
Hypothesis tests cover three broad categories:
• Testing whether the data you have fits a data model. In Chapter 18, we conducted a hypothesis
test to determine whether data fit the normal curve. We used the Chi-Squared Goodness-of-Fit
Test, but ultimately, it was a hypothesis test.
• Comparing a statistic to a hypothesis about the data or population.
• Answering the question whether something changed within the data, often after a team has
modified an input or other part of the process. In the case of most Six Sigma projects, the team
probably wants to find out whether the process or outcome is improved. While the type of hypothesis test you use depends on the answers you are seeking and the type of data you have, all of the tests follow essentially the same guidelines. • You begin with a statistic or criteria that you usually compute from your sample data.
• You create a null hypothesis and an alternate hypothesis, in keeping with the type of test you
are dealing with.
- Remember, in chapter 18 when we ran the Chi-Squared Goodness-of-Fit Test:
- The null hypothesis was that the data was normal
- The alternate hypothesis was that the data was not normal
- The statistic or criteria is compared against a reference criteria or distribution.
- How the calculated statistic compares to the reference criteria determines whether you accept the null hypothesis or reject the null hypothesis in favor of the alternative hypothesis.
Hypothesis tests are a large part of inferential statistics, where we draw conclusions about the overall process or population by analyzing the sample data and measurements. When stating hypotheses, we are not making statements about the sample. We are making statements about the population or entire process. A hypothesis is “the population mean is 5.” We don’t need to make a hypothesis about the sample mean – we can calculate the sample mean.
Null Versus Alternative
Hypothesis tests have two main parts: the null hypothesis and the alternative hypothesis. The null hypothesis is abbreviated as H0 and is usually a statement about the data that reflects no effect or no difference. In chapter 18, we hypothesized that our data was normal. In effect, we were saying “there is no statistical difference between the distribution of our data and the distribution of data on a normal curve.” The alternative hypothesis is abbreviated as Ha and is usually a statement that is likely to be true if the null hypothesis is not true. In chapter 18, the alternative hypothesis was “there is a statistical difference between the distribution of our data and the distribution of data on a normal curve.” In short, if we reject the null hypothesis, we accept the alternative hypothesis – in this case, that our data is not normal. Typically, the null hypothesis is an equal statement of some type. The mean of the new process is equal to the mean of the new process. The distribution of the data is equal to the normal curve. The alternative hypothesis is typically written as a not equals, a greater than, or a less than statement. The mean of the new process is greater than the mean of the old process. The distribution of the data is not equal to the normal curve. How you write the alternative hypothesis depends on the question you are asking and the type of hypothesis test you are running.
The Risk of Hypothesis Testing Error
Anytime you draw inferences about a population from sample data, there is at least some likelihood of error. With hypothesis testing, errors come in two types. • Type I Error: The null hypothesis is rejected when it is actually true.
o Also called producer risk
o The probability of the risk is measured by alpha, where 𝛼 is a probability between 0 and
1.
• Type II Error: The null hypothesis is accepted when it is actually false.
o Also called the consumer risk
o The probability of the risk is measured by beta, where 𝛽 is a probability between 0 and 1. To describe the risk – and set up our hypothesis test for what we deem to be an acceptable risk, a confidence level must be picked. The most common confidence level used is 95 percent, or 𝛼 = 0.05. Typically, the confidence level is set with the Type I error in mind, so you use alpha for the confidence level. The value of 𝛽 then contributes to the sample size requirements and the power; sample size will be covered in the next chapter. Additional information about alpha and beta values is covered in the section in this chapter on running individual hypothesis tests