Evaluative Summary of Article on
A Generalized Randomized Block (GRB-p) Design
Submitted by Kathleen S. Burger
1. Background Information
Authors: Stockwell, S. & Dye, A.
Title: Effects of Counselor Touch on Counseling Outcome
Source: Journal of Counseling Psychology
Year: 1980. Vol. 27, No. 5. 443-446
2. Abstract
Building on previous research concerning Counselor Touch, Stockwell and Dye
(1980) examined the effects of Counselor Touch on resulting client evaluation
of the counseling session and on levels of self-exploration, as reported by
the clients.
The authors report using a 2 (nonverbal treatment) X 2 (sex of counselor) X
2 (sex of client) Randomized Block design for this evaluation. A three-way ANOVA
was used to test for the significance of nonverbal treatment, counselor sex,
and client sex on client self-exploration and evaluation of counseling.
Participants in the study were 56 male clients and 44 female clients who were
enrolled in an undergraduate education course. The authors state that the clients
(subjects) were randomly selected and then asked to participate in vocational
counseling sessions. Graduate students from the counseling department acted
as vocational counselors in this study. 15 male and 11 female counselors were
trained in one of two detailed interviewing procedures. These procedures varied
only for the touch / no touch condition. Counselors attended at least one 60
- 90 minute training session and independent raters evaluated their proficiency.
Contrary to previous research finding a significant Touch X Counselor Sex X
Client Sex interaction (Alagna, et al, 1979), there were no statistically significant
findings in this research. Possible explanations for the different results are
addressed. Although this study did not produce significant findings, it is notable
that the authors implemented a design allowing for experimental control in a
naturalistic setting.
3. Null hypothesis, alpha (or p) - level, and sample size per group.
Because the authors did not specifically mention their hypotheses, and only
implied that client sex was considered a nuisance variable, the hypotheses stated
in the abstract is written from conjecture. In an attempt to reconstruct this
study, the most comprehensive interpretation will be offered in this critique.
Indications within the study, based on assumption of a Randomized Block design,
lead to a supposition that the possible null hypotheses may have been as follows:
(1) There is no significant difference in population means for the CEI based
on Non Verbal conditions (touch or no touch). (Ho:mt=mn)
There is a significant difference in population means for the CEI based on Non
Verbal conditions (touch or no touch). (H1:mt¹mn)
(2) There is no significant difference in population means for the CEI based
on counselor sex (male or female). (Ho:mm=mf)
There is a significant difference in population means for the CEI based on counselor
sex (male or female). (H1:mm¹mf)
(3) The population means for the client sex (male and female) is not different.
(Ho:s2 p = 0)
The population means for the client sex (male and female) is different. (H1:s2
p ¹ 0)
(4) *There is no significant interaction among these variables. (Ho:s2sz=0)
*There is a significant interaction among these variables (H1 :s2az ¹0)
(5) There is no significant difference in population means for the SES based
on Non Verbal conditions (touch or no touch). (Ho:mt=mn)
There is a significant difference in population means for the SES based on Non
Verbal conditions (touch or no touch). (H1:mt¹mn)
(6) There is no significant difference in population means for the SES based
on counselor sex (male or female). (Ho:mm=mf)
There is a significant difference in population means for the SES based on counselor
sex (male or female). (H1:mm¹mf)
(7) The population means for the client sex (male and female) is not different.
(Ho:s2 p = 0)
The population means for the client sex (male and female) is different. (H1:s2
p ¹ 0)
(8) *There is no significant interaction among these variables. (Ho:s2sz=0)
*There is a significant interaction among these variables (H1 :s2az ¹0)
* It must be noted at this point that in a Generalized Randomized Block Design,
it is possible to analyze for interaction. Because in the article the authors
referred to "no significant interaction effects," it must be assumed
they actually used a Generalized Randomized Block Design, even though they report
using a Randomized Block Design. Since the authors reported using a Randomized
Block Design, the appropriate hypotheses for this are reported. However, since
they more than likely actually used a Generalized Randomized Block Design, the
interaction hypotheses for this design are included.
The code key for these hypotheses is:
t= touch condition m=male counselor
n=no touch condition f=female counselor
The authors did not include detailed information regarding the alpha level(s),
p-levels, or sample size(s) per cell. Just as the reader is left to speculate
about the hypotheses, so are they left to speculate about these details. A reasonable
assumption that the overall alpha level of .05 (and a GRB Design) was used can
be gleaned from statements such as, "No significant main effect or interaction
(p<.05) was evidenced because of non-verbal treatment, counselor sex, or
client sex" (p. 445).
That p-values (and trend analysis) were considered can be deduced from statements
such as, "A possible trend (p<.078) toward higher levels of self-reported
satisfaction by clients who were not touched, in comparison with clients who
were touched, was found" (p. 445).
Sample size was equally mysterious. While the authors did report overall sample
sizes, cell size were not addressed. The overall sample configuration reported
included:
Male Subjects 56 Male Counselors 14
Female Subjects 44 Female Counselors 11
Total Subjects 100 Total Counselors 25
Based on the following sentence, it may be possible to reconstruct part of the
working design: "Because each counselor saw a client of each sex in the
touch condition and a client of each sex in the no touch condition, counselors
acted as their own controls" (p. 444). For this reason, the authors did
not consider it necessary to further control for associated counselor variables
such as personality characteristics, etc. A possible sample design configuration
may have been:
Client Sex Non-Verbal Treatment
(Blocking Variable) Touch No Touch
Male Couns. Female Couns. Male Couns. Female Couns.
Male Clients (n = 56 / 14ea) n Co = 4n Cl = 14 n Co = 3n Cl = 14 n Co = 4n Cl
= 14 n Co = 3n Cl = 14
Female Clients (n = 44 / 11ea) n Co = 3n Cl = 11 n Co = 3n Cl = 11 n Co = 3n
Cl = 11 n Co = 2n Cl = 11
Total Male Subjects 56 Total Male Counselors 14
Total Female Subjects 44 Total Female Counselors 11
Total Subjects 100 Total Counselors 25
The problem with this reconstruction (or various arrangements of it) is the
statement, as mention above; "
each counselor saw a client of each
sex in the touch condition and a client of each sex in the no touch condition
"(p.
444). The numbers don't work out. As the chart above illustrates, the restriction
that "each treatment level contains n units" (Kirk, p. 302) is violated.
While the ratio of counselors to clients was 1 to 4, the authors did not explain
how a particular client was assigned to a particular counselor. They did not
explain the uneven client to counselor ratio if the distribution was similar
to that above. Finally, they did not mention if particular counselors interviewed
more than one client, and if so, they did not address the increased possibility
of nonindependence along with potential problems of random assignment. These
possible violations of the assumptions tend to have a cumulative effect. Since
the authors do not address these issues, it is impossible to know just how the
authors approached, and / or adjusted for these inconsistencies. Further, it
is doubtful that this would have been a planned arrangement, and so the question
posed is, "what happened?"
Further, there is no mention that the subjects were randomly assigned to the
cells. Since there were no charts or tables within the article and very little
data was provided, it is impossible to replicate the study, or even to confidently
evaluate the authors' concept of planning, choices of design, methodology, or
interpretation.
4. Independent and dependent variables
There were two fixed independent variables with two levels each: Non-Verbal
treatment (touch and no touch) and Counselor Sex (male and female). Client Sex
is assumed to have been treated as the blocking variable. Counselor Sex has
two levels (male and female).
"A self-report measure (Counseling Evaluation Inventory - CEI) and a behavioral
measure (Depth of Self-Exploration Scale - SES) were the major dependent measures
in this study" (p. 445). The researchers evaluated domain factors from
this self-report measure (CEI), later referring to client ratings on Counseling
Climate, Counselor Comfort, Client Satisfaction, etc. They clarified that the
CEI was a measure of counselor effectiveness as judged by client ratings. No
further information was provided.
The authors mentioned that the Depth of Self-Exploration Scale (SES) was an
interval scale. However, there are no further details provided concerning the
instrument or possible domain factors. They provided information (scant) concerning
results, addressed later in this critique although they did mention that all
subjects received the same tests and that the tests were administered promptly
after the counseling sessions.
It may be that equivocating the dependent variable with scores is not "best
practices." Rather, the authors could have suggested a characteristic to
serve as a target of the data collection efforts.
5. Instrument, briefly comment on its reliability and validity
The authors provide information concerning a reliability coefficient (.83) for
the CEI. They mention that this figure resulted from a total score test-retest
procedure. The authors state that the reliability of the Self-Exploration Scale
was reported, by the scale creators, to range from .59 to .88, based on 12 studies.
Independent reports of reliability would have been more tenable. The authors
reported the source of these reliability coefficients, but no further information.
Content validity for the CEI was reported to be "high" and the authors
report research indicating that the CEI had high congruent or discriminative
validity for "practicum grades to be significant at or beyond the .05 level
for the total score on the CEI" (p. 445). What practicum grades had to
do with this research study was not addressed. The authors reported that those
who created the Depth of Self-Exploration Scale claimed that it had face validity.
This supporting information was not convincing. There were no definitions supported
by the literature indicating a consensus regarding the terms rated on the various
evaluations.
Two further instruments were administered to the clients. The first was the
Strong-Campbell Interest Inventory and the second was a questionnaire designed
to determine the extent of client knowledge of the purpose and parameters of
the study. No further information was provided.
Finally, a procedure compliance questionnaire was completed by each counselor
immediately following the counseling session. This questionnaire was designed
to determine the extent of counselor compliance with experimental procedures.
Unfortunately, there was no information provided concerning the psychometric
properties of these measures and therefore, the validity and reliability of
these instruments must be considered with caution. This serious omission must
be kept in mind when considering the overall validity of the conclusions obtained
as a result of these scores.
6. Experimental Procedure
It is in the area of describing the experimental procedure that these authors
were quite thorough. The majority of the article is dedicated to this description.
This is important because these authors note that although their experiment
did not result in significant findings, they were able to apply experimental
procedures in a naturalistic environment. Indeed, this is a difficult task,
but one that should, in this case, have researchers from both laboratory and
naturalistic positions cringing!
First, the authors describe the term "touch." For this study, touch
was defined as a "squeeze" (at least 4-5 seconds of firm contact)
between the hands and wrists of a counselor and the hands, arms, shoulders,
and upper back area of a client.
Next, the authors describe the subjects. As mentioned, there were 100 clients
(56 male and 44 female) and 25 counselors (14 male and 11 female). The clients
were enrolled in an undergraduate education course. Already, the convenience
sample sets the stage for violation of random selection. The authors write that
the "subjects were selected randomly" and then asked to participate.
No further information was provided. This omission leaves the reader questioning
whom the students represent. The counselors were either enrolled in or had complete
counseling practicum. (Could this be the same practicum that the scores on the
Depth of Self-Exploration were in some way correlated with the counselor's grades?)
There was no mention of method for selecting the counselors - again potentially
violating random selection assumptions for these participants in the study.
Counselors were trained in the administration of the experimental procedures
in one of 3 separate training sessions that were 60-90 minutes long. They then
demonstrated their proficiency to implement the experimental procedure in a
quasi-experimental counseling session. Two independent raters who had previously
been trained in the procedures rated each session. They repeated training until
they passed the training process.
After a 50-minute, audiotaped session with the counselor, the receptionist administered
the instruments (CEI & Depth of Self Exploration Scales) to the clients.
During the sessions, the interviews were highly structured, with a detailed
agenda, including time to establish rapport, interpretation of the Strong-Campbell
Interest Inventory (that clients had taken at some unmentioned point), opportunity
for client self-exploration and integration, integration of session, and termination.
Counselors in the touch condition were given specific instructions. A sample
of the very detailed instructions is, "On entering the reception area,
walk over to the client and introduce yourself, extending your hand for a handshake.
Maintain the handshake, eye contact, and a slight smile as you unhesitatingly
complete your introduction (4-5 seconds). Be sure to maintain the distance of
one arms length between yourself and the client" (p. 445). The remainder
of the session procedure was equally as detailed and the instructions for the
counselors in the no touch condition are identical, but with the "touch"
omitted.
No mention was made concerning the length of time data collection lasted; whether
all counseling sessions were presented during one school term or over a period
of several terms and there was no mention of a debriefing process.
In addition to ratings by independent raters, three additional methods for checking
internal validity were accomplished. This was designed to ensure counselor competence
to correctly administer experimental procedures in a "natural / spontaneous"
manner. The experimenter listened to the second half of all tapes and the first
half of randomly selected tapes to ensure verbal procedures and timing was correctly
accomplished. Although a video tape would have made much more sense since touch
cannot be seen on an audio tape, and that is the main point for this experiment!
Second, all clients completed an awareness questionnaire. While there was no
mention of the psychometric properties of this questionnaire, the authors did
include that the purpose of this questionnaire was to determine the extent of
client knowledge of the purpose and parameters of the study. Finally, the third
procedure designed to ensure internal validity was a procedure compliance questionnaire
that was completed by each counselor immediately following the counseling session.
This questionnaire was designed to determine the extent of counselor compliance
with experimental procedures.
7. Statistical Analysis and conclusion
The authors report using a fixed effects model based on a 2 (nonverbal treatment)
X 2 (sex of counselor) X 2 (sex of client) Randomized Block Factorial (RBF)
design for this evaluation. Two ANOVA analyses were conducted on the dependent
variables, the CEI and the Depth of Self-Exploration Scale.
There is little information presented concerning the statistical analysis, therefore,
the reader has little to use when evaluating the study. There is only partial
information provided concerning alpha levels, p-levels, trend analyses, ANOVA
results, etc. Results on the CEI indicated that there was no significant main
effect or interaction effect (p>.05). They report a possible trend (p<.078)
toward higher levels of self-reported satisfaction by clients who were not touched,
but not how this was determined. The authors wrote that they "weighted"
scores on the CEI. With no details provided, this is scary.
Results on the Depth of Self-Exploration Scale revealed that interrator reliability
on ratings assigned to audiotaped segments was high (.967). Female clients were
judged to be significantly more self-exploratory than were the male clients
(p<.05). Other effects and interactions were not significant (p<.05).
Interestingly, when describing results of the procedure compliance and awareness
questionnaires, the authors commented that if the results indicated that the
procedures were not followed or the clients were aware of what was going on,
the data was "not subject to further evaluation". The above scant
information is all that was reported.
Finally, there was no mention of the order for completion of the dependent variable
measures. A step could have been added to the design to provide for a switched
order of administration for each subject group. This carry-over effect leads
to uncertainty regarding the source of any effect detected.
The authors write that contrary to previous research finding a significant Touch
X Counselor Sex X Client Sex interaction (Alagna, et al, 1979), there were no
statistically significant findings in this research. They write that Counselor
Touch was not found to affect on scores obtained on the CEI or the Depth of
Self-Exploration Scale. Although this study did not produce significant findings,
(other than the one concerning female clients being more self-exploratory than
male clients) it is notable that the authors attempted to implement a design
allowing for experimental control in a naturalistic setting.
The authors suggest the reason for these conflicting results may be a result
of a small sample size used in previous research (n=20). Perhaps, they write,
the reason was that the subjects in previous research were all female or that
only one counselor of each sex participated in the study. The further speculate,
the puzzling results may be due to differences in the degree to which the experiment
was controlled. They continue to offer several other possibilities, but it is
impossible to evaluate these claims because the reader really has no clear idea
how this study was planned, designed or analyzed, and therefore, the authors'
interpretation for data generated in this study must be accepted with great
caution, if at all. In all fairness, the authors did present thorough information
regarding the procedure. And from this information, it would almost be possible
to replicate the procedure.
8. If you were the researcher, how would you improve the study?
To begin, the purpose of an experiment is to obtain an answer to or insight
about a specific research question or questions. To accomplish this, the research
question(s) must be precise. It may be that the authors did generate a precise
research question, but their reported design and strategy for analysis was severely
deficit. It was impossible to determine their design beyond the statement that
it was based on a randomized block design. A clue that it was not simply a randomized
block design, as reported, is that the researchers reported an interaction effect.
Another purpose for thoroughly reporting a study is to enable another researcher
to replicate the study. This study could not be replicated in its entirety.
While the authors did not address their rationale for using a Randomized Block
Design (RBD), the RBD provides a more powerful test (than the Completely Randomized
Design) because it removes the effects of client sex from the estimate of the
error variance. More specifically, the RBD partitions the total sum of squares
into three parts: SSA, SSBLOCKS, and SSRESIDUALS. In general, the F-statistic
for the RBD is greater than the F-statistic for the CR design and therefore
results in a more powerful test of a false null hypothesis (Kirk, p. 252).
An important point to consider involved the relationship between a blocking
variable and the dependent variable. Whenever a Randomized Block design is used,
the blocking variable must be highly related to the dependent variable. Theoretical
considerations must be given to the relationships. For this study, it seems
sensible that Client Sex is more theoretically related to the two dependent
variables than is Counselor Sex although the authors do not specifically state,
or justify, the use of Client Sex as a nuisance variable.
Having mentioned this, it must be noted that the criteria for blocking was potentially
violated. Kirk (p. 255) explains that when forming blocks, the object is to
assign experimental units to blocks so that those in a given block are as similar
as possible with respect to the dependent variable. There was no mention of
this. Further, there was no subject matching repeated measures, homogeneity,
or mutual selection matching. While it is agreed that the procedure used to
form blocks has no effect on the computational procedures, the interpretation
is affected. There was no mention of random assignment, and while the authors
claim to have randomly selected the subjects, they were actually selected from
a convenience sample of students who had no true need of vocational counseling.
It would have been helpful for the authors to explain how they determined required
sample size. Estimating power (1-b) - the probability of rejecting a false null
hypothesis is helpful when assessing the sensitivity of a statistical test and
also for determining the sample size to use. The sample size might have been
estimated from a pilot study (or the one previous study on this topic) although
there is not information supplied by the authors suggesting that it was. Since
there was a significant finding, (females are more self-exploratory than males)
the strength of association between the independent and dependent variable should
have been reported. Accounting for the appreciable portion of the variance in
the dependent variable could have been calculated using partial omega squared.
Effect Size could have been measured using Cohen's Measure of Effect Size computed
from the partial omega squared. If Tang's charts had been used, and a power
of .80 had been achieved, the sample size required would have been evident.
Since there is a tendency among researchers to underestimate the sample size
required to obtain
practical significance, Kirk (p. 187) states, "The use of w squared or
¦ combined with Cohen's guidelines for interpreting values of w squared
or ¦ requires the least amount of information and is the simplest"
(p. 187). Kirk continues, "An estimate of sample size is necessary to detect
effects that are practically significant and should always be made before an
experiment is performed. If a sample size is too small, it may give a reduced
chance of detecting treatment effects considered for practical significance.
In this research this was a serious omission and rather confusing. It is confusing
because the authors reported that the difference between their findings and
those of the previous study (Alagna, et al, 1979) may have been due to small
sample sizes. If small sample size was an issue, steps to ensure adequate sample
size should have been important.
The authors, much to their credit, did refer to a "trend" in the data.
However, no mention was made
of procedure and no visual representations were provided. Further, they did
not specify whether the trend was linear, quadratic, or cubic. Nor did they
refer to testing the "goodness of fit."
There are many threats to the validity and reliability of this study. The supporting,
contextual information that could have clarified important issues was omitted,
leaving critical issues unanswered. Without further information on these issues,
the entire study must be viewed with great caution. First, the conclusions drawn
in a research study are no better than the data on which they were based. The
validity of the test instruments used to obtain the scores remains largely a
mystery. There was no reference to either the reliability or validity of several
of the administered instruments. Perhaps a Cronbach's alpha method could be
used to assess the internal consistency of the instruments. Since the random
sampling was nested in a convenience sample, and since there was no information
concerning possible random assignment, it is difficult to know if the cell groups
are truly homogenous, other than the consideration for sex, and this is but
one variable to consider. This being said, it is therefore difficult to know
whether test scores (and resulting conclusions) are due to true differences
resulting from the treatments, or from differing groups, from unreliable / invalid
tests, or a combination of all of these factors. There is no mention of instrument
content validity, criterion-related validity, or construct validity. Construct
validity can be assessed through factor analysis. Multiple approaches to assess
instrument quality increases confidence that results obtained accurately represent
what the researchers want to measure.
Rich description regarding the subjects was neglected. All that was included
is that they were enrolled in an undergraduate education course, were "randomly"
selected and then agreed to participate. Again, contextual information helps
the reader to understand the population from which the samples were drawn and
to which the results may be generalized. Demographics, previous educational
experience, gender, and similar information is pertinent.
Random assignment is a particularly important distinction due to the assumptions
of ANOVA: randomness, independence, normality, and homogeneity of variance.
As pointed out in Huck, "the randomness and independence assumptions can
ruin a study if they are violated" (p. 417). Having unequal numbers of
subjects in each cell leads to loss of statistical power. Issues such as these
require planning during the design phase of the experiment. It would be advisable
for the researchers to concern themselves with the normality and homogeneity
of variance assumptions. Hartley's F-max test for equal population variance
could be used for this purpose. In the case of this study, the F-test may very
well be biased, causing the F-test to be either too large or too small. If the
F-test is too large, the computed p-value associated with a calculated F-value
will be too small. When this occurs, the amount that the data deviates from
the null is exaggerated and the alpha level will understate the probability
of a Type-1 error. If the bias is negative, the p-values associated with the
F-values will be too large, and the researcher may not reject a null hypothesis
that would have been rejected if the p-value were unbiased (Huck, 419).
Uneven sample sizes, potentially unreliable measuring instruments, or the likelihood
of too much within block (group) variability are some of the factors that could
have affected the results. The authors did not address the restrictive assumption
of sphericity, another area of concern. If the sphericity condition is not satisfied,
conventional F tests will be positively biased. Since the variables only had
two levels, there was no need to perform post-hoc comparisons. Because of the
fundamental, numerous, and serious problems with this study, this study must
be ignored or reaccomplished with attention to issues of planning, sampling,
instrumentation, statistical analysis, and thoroughness of descriptive verbiage
and visuals.
A final note
one lesson learned, is that when we publish research, it
will still be available for others to read 20 years later. While this reason
is but one for conducting excellent research, it serves as an "alert"
to the importance of very carefully planned and implemented work. It is probably
better to do no research than to be associated with work that is poorly planned,
conducted, or reported. While it is possible that this study was very thoughtfully
designed and carefully implemented, it is impossible to make that determination
due to the sparse information provided through the article. Therefore, beyond
the authors' innovative attempt to implement an experimental design in a naturalistic
environment, this article is found to have little practical value.