Evaluative Summary of Articles on
Two-way ANOVA CRF-pq Design
Submitted by Kathleen S. Burger
1. Background Information
Authors: Justen, III, J., Waldrop, P., and Adams, II, T.
Title: Effects of Paired versus Individual user Computer-Assisted Instruction
and type of feedback on student achievement.
Source: Educational Technology
Year: July 1990, 51-53.
2. Abstract
Building on previous research concerning Computer-Assisted Instruction (CAI),
this study was designed to determine if two students using one computer at the
same time would influence the relative effectiveness of various Feedback conditions.
The authors used a 2 (type of CAI - Individual / Paired) X 2 (type of Feedback
- Extended / Minimal) fixed-effect factorial design in this investigation. They
hypothesized that there would be no significant difference in student performance
between Paired and Individual use of CAI or between Minimal and Extended Feedback
conditions, and no significant interaction between the factors type of Feedback
and type of CAI in a Computer-Assisted Instructional tutorial. An identical
twenty question, multiple-choice test was administered to each of the 68 subjects.
Using these scores, Analysis of variance (ANOVA) procedures yielded a significant
main effect for type of Feedback (Extended/Minimal) [F (1,64) = 5.43, p= .02],
but no significant main effect for type of CAI [F (1,64)= .28, p>.05] or
for interaction [F (1,64)=3.36, p>.05]. Students in the Minimal Feedback
condition answered more test questions correctly (M=13) than did students in
the Extended Feedback condition (M=11.42). This finding favoring Minimal Feedback
condition was surprising and inconsistent with previous research literature
findings favoring Extended Feedback conditions.
3. Null hypothesis, alpha (or p) - level, and sample size per group.
The null hypotheses to be tested were stated as follows:
(1) There is no significant difference in student performance between Paired
and Individual use of Computer-Assisted Instruction. (Ho1: mp = mi)
(2) There is no significant difference in student performance between Minimal
and Extended Feedback conditions on a Computer-Assisted Instruction task. (Ho2:
mm = me)
(3) There is no significant interaction between type of Feedback and number
of users in a Computer-Assisted Instruction task. (Ho3: mf = mn)
The information provided about the alpha and p levels is scant. All that is
reported about the alpha
level is that the alpha level of .05 was used to determine significance. In
the first table, which reported the results of the Analysis of Variance, the
p value (.02) of the significant finding is reported. This indicates, to the
credit of the researchers, that they note the difference between the alpha level
and the p value.
Table 1
Results of Analysis of Variance
Source SS df MS F
Type of CAI 2.22 1 2.22 .28
Type of Feedback 43.05 1 43.05 5.43*
CAI X Feedback 26.59 1 26.59 3.36
Error 507.06 64 7.92
*p=.02
It would have been informative if Table 1 included a column containing all
the p values in addition to a row containing Totals for df, SS, and MS. Total
sample size, N=68, was reported in the narration, while sample size per cell,
test score means, and standard deviation were reported in the second table.
Table 2
Group Means and Standard Deviations forCriterion Measures
Group N M SD
Paired/Minimal Feedback 19 13.79 2.92
Paired/Extended Feedback 14 10.93 1.98
Individual/Minimal Feedback 18 12.17 3.00
Individual/Extended Feedback 17 11.82 3.07
4. Independent and dependent variables
There were two fixed independent variables (factors): Type of Feedback and Type
of Computer-Aided Instruction. Each factor had two levels. Type of Feedback
levels were Minimal and Extended Feedback, while Type of Computer-Aided Instruction
levels were Individual and Paired Approach.
The dependent variable (the criterion measure) was scores from a 20 item multiple-choice
test. Each item contained four choices and all subjects received the same test.
It may be that equivocating the dependent variable with scores is not "best
practices." Rather, the authors could have suggested a characteristic to
serve as a target of the data collection efforts.
5. Instrument, briefly comment on its reliability and validity
The authors explain that the test consisted of 20 multiple-choice questions
(4 options each) and that it was administered to all subjects.The authors did
not mention if the company that developed the CAI tutorial also constructed
this test or if the researchers constructed the test. Unfortunately, there is
no information provided concerning the psychometric properties of this test
and therefore, the validity and reliability of this instrument are impossible
to determine. This serious omission must be kept in mind when considering the
overall validity of the conclusions obtained as a result of these scores.
6. Experimental Procedure
Participants were students enrolled in four upper division education courses.
Written consent was obtained after subjects were informed of the nature of the
study. There was no further information provided concerning the participants
(gender, SES, etc.), which seriously limits generalizability and the possibility
of replicating the study. This article also does not address the degree to which
subjects were or were not willing volunteers. It would be interesting to note
if there were any students in the classes who did not participate in the experiment,
and have information about them as well.
The authors write, "two classes were randomly assigned to the Paired use
treatment and the remaining two classes were assigned to the Individual use
treatment" (p. 51). The wording in this passage is imprecise. It is unclear
whether the classes were randomly assigned or whether the participants in the
classes were randomly assigned. Further, the reader is left to wonder whether
the other two other classes (or participants in the classes) were or were not
randomly assigned since the authors simply write that they "were assigned."
The four groups (Paired Approach with Minimal Feedback, Paired Approach with
Extended Feedback, Individual Approach with Minimal Feedback, and Individual
Approach with Extended Feedback) participated in Computer-Assisted Instruction
tutorials. The CAI content consisted of six lessons related to hypotheses testing
research. McGraw Hill Courseware Authoring System designed the tutorial. Unfortunately,
no information was provided concerning this courseware. The authors do write,
however, that the modules were identical except for the Feedback frames. The
modules either contained Extended Feedback or Minimal Feedback.
All subjects were tested using the same multiple-choice test. There is no information
concerning the psychometric properties of the test. There is no specific information
indicating whether subjects were tested after each module or after the six lessons
were completed, although the text implies one test was used one time. Further,
no mention was made concerning the length of time this process lasted; whether
all modules were presented during one session or over a period of several weeks
or months. There was no mention of a debriefing process.
7. Statistical Analysis and conclusion
The authors used a 2 (type of CAI - Individual / Paired) X 2 (type of Feedback
- Extended / Minimal) fixed-effect factorial design in this investigation. They
hypothesized that there would be no significant difference in student performance
between Paired and Individual use of Computer-Assisted Instruction or between
Minimal and Extended Feedback conditions, and no significant interaction between
type of Feedback and number of users in a Computer-Assisted Instructional tutorial.
An identical twenty question, multiple-choice test was administered to each
of the 68 subjects. Using these scores, Analysis of variance (ANOVA) procedures
yielded a significant main effect for type of Feedback (Extended/Minimal) [F
(1,64) = 5.43, p= .02], but no significant main effect for type of CAI [F (1,64)=
.28, p>.05] or for interaction [F (1,64)=3.36, p>.05]. Students in the
Minimal Feedback condition answered more test questions correctly (M=13) than
did students in the Extended Feedback condition (M=11.42). No further analyses
were reported.
According the authors, the finding favoring the Minimal Feedback condition was
surprising and inconsistent with previous research literature findings favoring
Extended Feedback conditions. In an effort to explain this inconsistent finding,
the authors propose that the difficulty level of the material learned may account
for these results. From there, they speculate that perhaps the difficulty level
plays a role in the effectiveness of the Feedback. This possibility has face
validity and while it provides a future direction for research, it may serve
as a threat to the validity of this study since it was not controlled for.
The author's conclusion to this study, that "the results suggest that group
instruction with computer-assisted tasks is a viable means of providing CAI"
(p. 52, 53), is interesting. It is interesting because this idea does not happen
to be one of the hypotheses tested. The tested hypothesis that comes closest
to pertaining to this suggested result was "There is no significant difference
in student performance between Paired and Individual use of Computer-Assisted
Instruction. (Ho1: mp = mi)." And, it seems as if the authors overlooked
the results of their statistical procedures, as this null hypothesis was not
rejected. In other words, based on their data and statistical procedures, there
was not a significant difference between the mean scores from the tests they
administered when comparing the Paired vs. Individual use of CAI. These results
imply that test scores of those who worked on the tutorial individually were
quite similar to those who worked on the tutorial in pairs. The results do not
directly pertain to their assertion that "group instruction is a viable
means of providing CAI." The only precise conclusions the authors can make,
and even these are suspect based on the limitations of this study, is that the
difference between the score means for Type of Feedback (Minimum vs. Extended)
conditions was statically significant and the other hypotheses tested did not
yield statistically significant results.
8. If you were the researcher, how would you improve the study?
To begin, the purpose of an experiment is to obtain an answer to or insight
about a specific research question or questions. To accomplish this, the research
question(s) must be precise. It may be that the authors did generate a precise
research question, but their formally stated conclusions do not pertain to the
question they asked. The question they proposed was "does the paired use
of computers influence the relative effectiveness of various feedback conditions?"
(p. 51). Their conclusion was "the results suggest that group instruction
with computer-assisted tasks is a viable means of providing CAI" (p. 52,
53). They further wrote that this study "found students performed better
under Minimal Feedback conditions" (p. 53). The second conclusion is appropriate,
in terms of it being a finding from this study. It does not, however, answer
the proposed research question. While analysis of variance (ANOVA) suggested
that there was a significant difference in test scores between Minimal and Extended
Feedback conditions, it also suggested that there was no significant main effect
for Paired vs. Individual use of CAI and no significant interaction effect between
Type of Feedback and Number of Users in a CAI task. To improve this study, the
research question(s), hypotheses, and conclusions must address the same issues.
Further, all findings must be addressed.
There are many threats to the validity and reliability of this study. The supporting,
contextual information that could have clarified important issues was omitted,
leaving critical issues unanswered. Without further information on these issues,
the entire study must be viewed with great caution. First, the conclusions drawn
in a research study are no better than the data on which they were based. The
test instrument used to obtain the scores remains a mystery. We do not know
if the researchers designed the test or whether it was designed by the McGraw
Hill Courseware Authoring System. The only information provided is that it is
a 20-item multiple-choice test (4 options) that was administered to all of the
participants in the study. The reliability of this instrument could have been
evaluated. Perhaps a Cronbach's alpha method could be used to assess the internal
consistency of the instrument. Since random sampling was not conducted, and
information regarding the random assignment is vague, it is difficult to know
if the four groups are "equivalent." This being said, it is therefore
difficult to know whether test scores (and resulting conclusions) are due to
true differences resulting from the treatments, or from differing groups, from
an unreliable test, or a combination of all of these factors. There is no mention
of the instruments content validity, criterion-related validity, or construct
validity. Construct validity can be assessed through factor analysis. Multiple
approaches to assess instrument quality increases confidence that results obtained
accurately represent what the researchers want to measure.
Similarly, the authors provided no information regarding the Computer-Aided
Instructional tutorial designed by McGraw Hill Courseware Authoring System.
No information is provided concerning the level of instruction or previous results
obtained by using this tutorial. Rich and thorough description concerning the
tutorial and the test must be provided.
Rich description regarding the subjects was neglected. All that was included
is that they were enrolled in one of four upper level education courses. Again,
contextual information helps the reader to understand the population from which
the samples were drawn and to which the results may be generalized. Demographics,
previous educational experience, gender, and similar information is pertinent.
In addition, there needs to be clarification regarding the method of group assignment.
It is unclear whether the students in the classes were randomly assigned or
whether the classes (as stated by the authors) were randomly assigned.
This is a particularly important distinction due to the assumptions of ANOVA:
randomness, independence, normality, and homogeneity of variance. As pointed
out in Huck, "the randomness and independence assumptions can ruin a study
if they are violated" (p. 417). Having unequal numbers of subjects in each
cell leads to loss of statistical power. Issues such as these require planning
during the design phase of the experiment. It would be advisable for the researchers
to concern themselves with the normality and homogeneity of variance assumptions.
Hartley's F-max test for equal population variance could be used for this purpose.
In the case of this study, the F-test may very well be biased, causing the F-test
to be either too large or too small. If the F-test is too large, the computed
p-value associated with a calculated F-value will be too small. When this occurs,
the amount that the data deviates from the null is exaggerated and the alpha
level will understate the probability of a Type-1 error. If the bias is negative,
the p-values associated with the F-values will be too large, and the researcher
may not reject a null hypothesis that would have been rejected if the p-value
were unbiased (Huck, 419).
The researchers did not perform any statistical analyses other than the ANOVA.
Since it is possible for results to be statistically significant but not practically
significant, the possibility of this occurring should be considered. Computing
the effect size indices can provide information concerning the practical significance
of the results. Power analysis is another procedures that analyses practical
significance. Power analysis can be conducted during the design phase to help
determine if the experiment is worthwhile to conduct, or it can be conducted
after data is collected to see if there was sufficient power associated with
the completed statistical test(s). Finally, strength of association measures
such as eta squared and omega squared can be computed. Post Hoc measures were
not necessary since this was a 2X2 analysis of variance.
It must be noted that simply because one of the null hypotheses was rejected
and two of null hypotheses were not rejected, it may be a result of Type I or
Type II error. Small sample sizes, unreliable measuring instruments, or too
much within-group variability are some of the factors that could have affected
the results. Because of the fundamental, numerous, and serious problems with
this study, this study must be ignored or redesigned with attention to issues
of planning, sampling, instrumentation, statistical analysis, and clarity of
presentation.