Chapter 15
INFERENTIAL STATISTICS
Inferential statistics deal with, of all things, inferences. Inferences about what? Inferences about populations based on the results of samples. Inferential statistics allow researchers to generalize to a population of individuals based on information obtained from a limited number of research participants. Most educational research studies deal with samples from larger populations. The more representative a sample is, the more generalizable its results will be to the population from which the sample was selected. Results that are representative only of that particular sample are of very limited research use. Consequently, random samples are preferred. Inferential statistics are concerned with determining whether results obtained from a sample or samples are the same as would have been obtained for the entire population.
Inferential statistics is statistical procedures that are used to make inferences or generalizations about a population from a set of data. Statistical inference is based on probability theory. (Jack C. Richard, 2002:256)
The Null Hypothesis
Hypothesis testing is a process of decision making about the results of a study. If the experimental group’s mean is 35 and the control group’s mean is 27, the researcher has to decide whether the difference in the treatments or simply sampling error. When we talk about the difference between two sample means being a true or real difference we mean that the difference was caused by the treatment (the independent variable), and not by chance. In other words, an observed difference is either caused by the treatment, as stated in the research hypothesis, or is the result of chance, random sampling error. The chance explanation for the difference is called the null hypothesis. The null hypothesis states that there is no true difference or relationship between parameters in the populations, and that any difference or relationship found for the samples is the result of sampling error. A null hypothesis might state:
There is no significant difference between the mean reading comprehension of firstgrade students who receive whole language reading instruction and firstgrade students who receive basal reading instruction.
This hypothesis says that there really is not any difference between two methods, and if you find one in your study, it is not a true difference, but a chance difference resulting from sampling error.
The null hypothesis for a study is usually (although not necessarily) different from the research hypothesis. The research hypothesis typically states that one method is expected to be more effective than another, while the null hypothesis states that there is not difference between the methods.
In a research study, the test of significance selected to determine whether a difference between means is a true difference provides a test of the null hypothesis. As a result , the null hypothesis is either rejected as being probably false, or not rejected as being probably true. Notice the word probably . We never know with total certainty that we are making the correct decision; what we can do is estimate the probability of our being wrong. After we make the decision to reject or not reject the null hypothesis, we make an inference back to our research hypothesis. If, for example, our research hypothesis states that A is better than B, and if we reject the null hypothesis (that there is no difference between A and B), and if the mean for A is greater than the mean for B, Then we conclude that our research hypothesis was supported—not proven! If we do not reject the null hypothesis (A is not different from B), then we conclude that our research hypothesis was not supported.
In order to test a null hypothesis we need a test of significance and we need to select a probability level that indicates how much risk we are willing to take that the decision we make is wrong.
Test of Significance
The test of significance is usually carried out using a preselected probability level that serves as a criterion to determine whether we reject or fail to reject the null hypothesis. The usual preselected probability level is either 5 out of 100 or 1 out of 100 chances that the observed difference did not occur by chance. If the probability of the difference between two means is likely to occur less than 5 times in 100 (or 1 time in 100), it is very unlikely to have occurred by chance, sampling error. Thus, there is a high (but not perfect) probably that the difference between the means did not occur by chance. Thus, the most likely explanation for the difference is that the two treatments were differentially effective. That is, there was a real difference between the means. Obviously, if we can say we would expect such a difference by chance only 1 time in 100 times, we are more confident in our decision that if we say we would expect such a chance difference 5 times in 100. how confident we are depends on the level of significance, or probability level, at which we perform our test of significance.
Degrees of Freedom
Degrees of freedom (df ) are dependent upon the number of participants and the number of groups. Suppose I ask you to name any five number. You agree and say “1, 2, 3, 4, 5.” In this case N is equal to 5—you had 5 choices or 5 degrees of freedom to select the numbers. Now suppose I tell you to name 5 numbers and you say “1, 2, 3, 4, …,” and I say, “Wait! The mean of the five numbers you choose must be 4.” Now you have no choice—your last number must be 10 because 1 + 2 + 3 + 4 + 10 = 20 and 20 divided by 5 = 4. you lost one degree of freedom because of the restriction (lack of freedom) that the mean must be 4. in other words, instead of having N = 5 degrees of freedom, you only had N = 4 (5 – 1) degrees of freedom.
Each test of significance has its own formula for determining degrees of freedom. For the correlation coefficient, r, the formula is N – 2. The number 2 is a constant, requiring that degrees of freedom for r are always determined by subtracting 2 from N, the number of participants.
The t test.
The t test is used to determine whether two means are significantly different at a selected probability level. In determining significance, the t test makes adjustments for the fact that the distribution of scores for small samples becomes increasingly different from the normal distribution as sample sizes become increasingly smaller. For example, distributions for smaller samples tend to be higher at the mean and at the two ends of the distribution. Because of this, the t values required to reject a null hypothesis are higher for small samples. As the size of the samples becomes larger, the score distribution approaches normality. There are two different types of t test , the t test for independent samples and the t test for nonindependent samples.
Independent samples are two samples that are randomly formed without any type of matching. The members of one sample are not related to members of the other sample in any systematic way, other than that they are selected from the same population. If two groups are randomly formed, the expectation is that at the beginning of a study they are essentially the same with respect to performance on the dependent variable. Therefore, if they are also essentially the same at the end of the study (their means are close), the null hypothesis is probably true. If, on the other hand, their means are not close at the end of the study, the null hypothesis is probably false and should be rejected. The key word is essentially .
The t test for nonindependent samples is used to compare groups that are formed by some type of matching or to compare a single group’s performance on a pre and posttest or on two different treatments. When samples are not independent, the members of one group are systematically related to the members of a second group (especially it is the same group at two different times). If samples are nonindependent, scores on the dependent variable are expected to be correlated with each other, and a special t test for correlated, or nonindependent, means is used. When samples are nonindependent, the error term of the t test tends to be smaller, and therefore, there is a higher probability that the null hypothesis will be rejected. Thus, the t test for nonindependent samples is used to determine whether there is probably a significant difference between the means of two matched, or nonindependent, samples or between the means for one sample at two different times.
Simple Analysis of Variance
Simple, or oneway, analysis of variance (ANOVA) is used to determine whether there is a significant difference between two or more means at a selected probability level. Thus, for a study involving three groups, ANOVA is the appropriate analysis technique. Like two posttest means in the t test, three (or more) posttest means in ANOVA are unlikely to be identical, so the key question is whether the differences among the means represent true, significant differences or chance differences due to sampling error. To answer this question ANOVA is used and an F ratio is computed. You may be wondering why you cannot just compute a bunch of t tests, one for each pair of means. Aside from some statistical problems concerning resulting distortion of your probability level, it is more convenient to perform one ANOVA than to perform several t tests. For example, to analyze four means, six separate t test would be required (X_{1} – X_{2} , X_{1} – X_{3} , X_{1} – X_{4} , X_{2} – X_{3} , X_{2} – X_{4} , X_{3} – X_{4} ). ANOVA is much more efficient and keeps the error rate under control.
The concept underlying ANOVA is that the total variation, or variance, of scores can be divided into two sources—treatment variance (variance between groups, caused by the treatment groups) and error variance (variance within groups). A ratio is formed, (the F ratio) with treatment variance as the numerator (variance between groups). It is assumed that randomly formed groups of participants are chosen and are essentially the same at the beginning of a study on a measure of the dependent variable. At the end of the study, we determine whether the variance between groups differs from the error variance by more than what would be expected by chance. In other words, if the treatment variance is sufficiently larger than the error variance, a significant F ratio results; the null hypothesis is rejected, and it is concluded that the treatment had a significant effect on the dependent variable. If, on the other hand, the treatment variance and error variance do not differ by more than what would be expected by chance, the resulting F ratio is not significant and the null hypothesis is not rejected. The greater the difference, the larger the F ratio. To determine whether the F ratio is significant, consult F table. Find the place corresponding to the selected probability level and the appropriate degree of freedom. The degrees of freedom for the F ratio are a function of the number of groups and the number of participants.
Suppose we have the following set of posttest scores for three different posttests from the same group.
No

Code of

Sample 1

Sample 2

Sample 3


Students

X_{1}

X_{1} ²

X_{2}

X_{2} ²

X_{3}

X_{3} ²

1

S1

55.7

3102.5

63.7

4057.69

68.7

4719.69

2

S2

68.3

4664.9

76.3

5821.69

83

6889

3

S3

66

4356

69.7

4858.09

75.3

5670.09

4

S4

60.3

3636.1

62

3844

81

6561

5

S5

49.3

2430.5

56.7

3214.89

71.3

5083.69

6

S6

47.7

2275.3

59

3481

78.7

6193.69

7

S7

59.3

3516.5

69.7

4858.09

82

6724

8

S8

47.7

2275.3

51.3

2631.69

73.7

5431.69

9

S9

61

3721

65

4225

78.7

6193.69

10

S10

56

3136

61.7

3806.89

74.3

5520.49

11

S11

49

2401

56.3

3169.69

69.7

4858.09

12

S12

46

2116

59

3481

81

6561

13

S13

63.7

4057.7

78

6084

84.7

7174.09

14

S14

64

4096

67.7

4583.29

71.7

5140.89

15

S15

60.3

3636.1

71.3

5083.69

75

5625

16

S16

55.3

3058.1

65.3

4264.09

75.3

5670.09

17

S17

48

2304

61.3

3757.69

78

6084

18

S18

45.3

2052.1

63

3969

66.7

4448.89

19

S19

63.3

4006.9

68.7

4719.69

76.7

5882.89

20

S20

62.3

3881.3

74

5476

85.3

7276.09

21

S21

68

4624

76.7

5882.89

83.3

6938.89

22

S22

67.7

4583.3

72.3

5227.29

86.3

7447.69

23

S23

56.3

3169.7

75

5625

87.3

7621.29

24

S24

49.7

2470.1

60.3

3636.09

72.3

5227.29

25

S25

51

2601

52.7

2777.29

73

5329

26

S26

51.3

2631.7

70

4900

73.3

5372.89

27

S27

51.3

2631.7

74.7

5580.09

77.3

5975.29

28

S28

49

2401

62.3

3881.29

73.7

5431.69

29

S29

53.3

2840.9

60.7

3684.49

73.3

5372.89

30

S30

52.3

2735.3

60.3

3636.09

74

5476

31

S31

59.3

3516.5

75.7

5730.49

81

6561

32

S32

68.3

4664.9

72

5184

82

6724

33

S33

50

2500

71.3

5083.69

82

6724

34

S34

60

3600

66.3

4395.69

74.3

5520.49

35

S35

61.3

3757.7

65.7

4316.49

77.7

6037.29

36

S36

63.7

4057.7

66.3

4395.69

78.7

6193.69

37

S37

46.7

2180.9

57.7

3329.29

70.7

4998.49

38

S38

61.7

3806.9

70.7

4998.49

82.7

6839.29

39

S39

56.7

3214.9

68.7

4719.69

75

5625

40

S40

67

4489

80

6400

81.7

6674.89

41

S41

61

3721

73.3

5372.89

78.3

6130.89

42

S42

52

2704

67.3

4529.29

72.3

5227.29


∑

2386.1

137625

2799.7

188673

3241

251157



n_{1} = 42


n_{2} = 42


n_{3} = 42


∑ X = ∑ X_{1} + ∑ X_{2} + ∑ X_{3} = 2386.1 + 2799.7 + 3241= 8426.8
∑ X^{2} = ∑ X_{1} ² + ∑ X_{2} ² + ∑ X_{3} ² = 137625 + 188673 + 251157 = 577455.9
N = n_{1} + n_{2} + n_{3} = 42 + 42 + 42 = 126
First, find the SS_{total} .
SS_{total} . = ∑ X^{2} _ ( ∑ X)^{2 }
N
= 577455.7 _ (8426.8) ^{2}
126
= 577455.7 – 563579
SS_{total} . = 13876,7
Next, do SS_{between} .
SS_{between} . = ( ∑ X_{1} ) ^{2 } + ( ∑ X_{2} ) ^{2 } + ( ∑ X_{3} ) ^{2 } _ ( ∑ X) ^{2 }
n_{1} n_{2} n_{3} N
= (2386.1) ^{2} + (2799.7) ^{2} + (3241)^{2} _ (8426.8) ^{2}
42 42 42 126
= 135558.9 + 186626.7 + 250097.2 – 563579
= 572282.8 – 563579
SS_{between} = 8703.8
Now how are we going to get SS_{within} ? we subtract SS_{between} from SS_{total} . :
SS_{within} = SS_{total} .  SS_{between}
= 13876.7 – 8703.8
SS_{within } = 5172.9
Now we have everything we need to begin! Seriously, we have all the pieces, but we are not quite there yet. Let us fill in a summary table with what we have and you will see what is missing:
__________________________________________________
Source of Sum of Mean
Variation Squaresdf Square F
__________________________________________________
Between 8703.8 (K – 1)
Within 5172.9 (N – K)
Total 13876.7 (N – 1)
__________________________________________________
The first thing you probably notices is that each term has its own formula for degrees of freedom. The formula for the between term is K – 1, where K is the number of treatment groups; thus, the degrees of freedom are K – 1 = 3 – 1 = 2. The formula for the within term is N – K, where N is the total sample size and K is still the number of treatment groups; thus, degrees of freedom for the within term N – K = 126 – 3 = 123. We do not need them, but for the total term, df = N – 1 = 42 – 1 = 41. Now what about mean squares? Mean squares are found by dividing each sum of squares as MS , using the subscript B for between and W for within. Thus, we have the equation:
Mean square = Sum of squares
Degrees of freedom
MS = SS
df
for between, MS_{B} , we get:
MS_{B} = SS_{B}
df
= 8703,8
2
MS_{B} = 4351,9 (= 4352)
for within, MS_{w} , we get:
MS_{w} = SS_{w}
df
5172,9
123
MS_{w} = 42
Now all we need is our F ratio. The F ratio is a ratio of MS_{B} and MS_{w} :
F = MS_{B}
MS_{w}
F = 4352
42
F = 103.6
Filling in the rest of our summary table we have:
__________________________________________________
Source of Sum of Mean
Variation Squares df Square F
__________________________________________________
Between 8703.8 (K – 1) = 2 4352 103.6
Within 5172.9 (N – K) = 123 42
Total 13876.7 (N – 1) = 125
__________________________________________________
Note that we simply divided across (8703.8: 2 = 4352 and 5172.9 : 123 = 42) and then down (4352 : 42 = 103.6). thus, F = 103.6 with 2 and 123 degrees of freedom.
Assuming that a = .05, we are now ready to go to our F table. (look at books of statistics to find the table). We find that F table = 4.79, the value of F required for statistical significance (required in order to reject the null hypothesis) if a = .05. The question is whether our F value, 103.6, is greater than 4.79. Obviously it is. Therefore, we reject the null hypothesis and conclude that there is a significant difference among the three group means.
Thu, 12 May 2011 @12:45