Evaluative Summary of Article on
A Generalized Randomized Block (GRB-p) Design
Submitted by Kathleen S. Burger

 

1. Background Information
Authors: Stockwell, S. & Dye, A.
Title: Effects of Counselor Touch on Counseling Outcome
Source: Journal of Counseling Psychology
Year: 1980. Vol. 27, No. 5. 443-446

2. Abstract
Building on previous research concerning Counselor Touch, Stockwell and Dye (1980) examined the effects of Counselor Touch on resulting client evaluation of the counseling session and on levels of self-exploration, as reported by the clients.
The authors report using a 2 (nonverbal treatment) X 2 (sex of counselor) X 2 (sex of client) Randomized Block design for this evaluation. A three-way ANOVA was used to test for the significance of nonverbal treatment, counselor sex, and client sex on client self-exploration and evaluation of counseling.
Participants in the study were 56 male clients and 44 female clients who were enrolled in an undergraduate education course. The authors state that the clients (subjects) were randomly selected and then asked to participate in vocational counseling sessions. Graduate students from the counseling department acted as vocational counselors in this study. 15 male and 11 female counselors were trained in one of two detailed interviewing procedures. These procedures varied only for the touch / no touch condition. Counselors attended at least one 60 - 90 minute training session and independent raters evaluated their proficiency.
Contrary to previous research finding a significant Touch X Counselor Sex X Client Sex interaction (Alagna, et al, 1979), there were no statistically significant findings in this research. Possible explanations for the different results are addressed. Although this study did not produce significant findings, it is notable that the authors implemented a design allowing for experimental control in a naturalistic setting.

3. Null hypothesis, alpha (or p) - level, and sample size per group.
Because the authors did not specifically mention their hypotheses, and only implied that client sex was considered a nuisance variable, the hypotheses stated in the abstract is written from conjecture. In an attempt to reconstruct this study, the most comprehensive interpretation will be offered in this critique. Indications within the study, based on assumption of a Randomized Block design, lead to a supposition that the possible null hypotheses may have been as follows:

(1) There is no significant difference in population means for the CEI based on Non Verbal conditions (touch or no touch). (Ho:mt=mn)
There is a significant difference in population means for the CEI based on Non Verbal conditions (touch or no touch). (H1:mt¹mn)
(2) There is no significant difference in population means for the CEI based on counselor sex (male or female). (Ho:mm=mf)
There is a significant difference in population means for the CEI based on counselor sex (male or female). (H1:mm¹mf)
(3) The population means for the client sex (male and female) is not different. (Ho:s2 p = 0)
The population means for the client sex (male and female) is different. (H1:s2 p ¹ 0)
(4) *There is no significant interaction among these variables. (Ho:s2sz=0)
*There is a significant interaction among these variables (H1 :s2az ¹0)
(5) There is no significant difference in population means for the SES based on Non Verbal conditions (touch or no touch). (Ho:mt=mn)
There is a significant difference in population means for the SES based on Non Verbal conditions (touch or no touch). (H1:mt¹mn)
(6) There is no significant difference in population means for the SES based on counselor sex (male or female). (Ho:mm=mf)
There is a significant difference in population means for the SES based on counselor sex (male or female). (H1:mm¹mf)
(7) The population means for the client sex (male and female) is not different. (Ho:s2 p = 0)
The population means for the client sex (male and female) is different. (H1:s2 p ¹ 0)
(8) *There is no significant interaction among these variables. (Ho:s2sz=0)
*There is a significant interaction among these variables (H1 :s2az ¹0)
* It must be noted at this point that in a Generalized Randomized Block Design, it is possible to analyze for interaction. Because in the article the authors referred to "no significant interaction effects," it must be assumed they actually used a Generalized Randomized Block Design, even though they report using a Randomized Block Design. Since the authors reported using a Randomized Block Design, the appropriate hypotheses for this are reported. However, since they more than likely actually used a Generalized Randomized Block Design, the interaction hypotheses for this design are included.
The code key for these hypotheses is:
t= touch condition m=male counselor
n=no touch condition f=female counselor
The authors did not include detailed information regarding the alpha level(s), p-levels, or sample size(s) per cell. Just as the reader is left to speculate about the hypotheses, so are they left to speculate about these details. A reasonable assumption that the overall alpha level of .05 (and a GRB Design) was used can be gleaned from statements such as, "No significant main effect or interaction (p<.05) was evidenced because of non-verbal treatment, counselor sex, or client sex" (p. 445).
That p-values (and trend analysis) were considered can be deduced from statements such as, "A possible trend (p<.078) toward higher levels of self-reported satisfaction by clients who were not touched, in comparison with clients who were touched, was found" (p. 445).
Sample size was equally mysterious. While the authors did report overall sample sizes, cell size were not addressed. The overall sample configuration reported included:
Male Subjects 56 Male Counselors 14
Female Subjects 44 Female Counselors 11
Total Subjects 100 Total Counselors 25
Based on the following sentence, it may be possible to reconstruct part of the working design: "Because each counselor saw a client of each sex in the touch condition and a client of each sex in the no touch condition, counselors acted as their own controls" (p. 444). For this reason, the authors did not consider it necessary to further control for associated counselor variables such as personality characteristics, etc. A possible sample design configuration may have been:

Client Sex Non-Verbal Treatment
(Blocking Variable) Touch No Touch
Male Couns. Female Couns. Male Couns. Female Couns.
Male Clients (n = 56 / 14ea) n Co = 4n Cl = 14 n Co = 3n Cl = 14 n Co = 4n Cl = 14 n Co = 3n Cl = 14
Female Clients (n = 44 / 11ea) n Co = 3n Cl = 11 n Co = 3n Cl = 11 n Co = 3n Cl = 11 n Co = 2n Cl = 11

Total Male Subjects 56 Total Male Counselors 14
Total Female Subjects 44 Total Female Counselors 11
Total Subjects 100 Total Counselors 25
The problem with this reconstruction (or various arrangements of it) is the statement, as mention above; "…each counselor saw a client of each sex in the touch condition and a client of each sex in the no touch condition…"(p. 444). The numbers don't work out. As the chart above illustrates, the restriction that "each treatment level contains n units" (Kirk, p. 302) is violated. While the ratio of counselors to clients was 1 to 4, the authors did not explain how a particular client was assigned to a particular counselor. They did not explain the uneven client to counselor ratio if the distribution was similar to that above. Finally, they did not mention if particular counselors interviewed more than one client, and if so, they did not address the increased possibility of nonindependence along with potential problems of random assignment. These possible violations of the assumptions tend to have a cumulative effect. Since the authors do not address these issues, it is impossible to know just how the authors approached, and / or adjusted for these inconsistencies. Further, it is doubtful that this would have been a planned arrangement, and so the question posed is, "what happened?"
Further, there is no mention that the subjects were randomly assigned to the cells. Since there were no charts or tables within the article and very little data was provided, it is impossible to replicate the study, or even to confidently evaluate the authors' concept of planning, choices of design, methodology, or interpretation.

4. Independent and dependent variables
There were two fixed independent variables with two levels each: Non-Verbal treatment (touch and no touch) and Counselor Sex (male and female). Client Sex is assumed to have been treated as the blocking variable. Counselor Sex has two levels (male and female).
"A self-report measure (Counseling Evaluation Inventory - CEI) and a behavioral measure (Depth of Self-Exploration Scale - SES) were the major dependent measures in this study" (p. 445). The researchers evaluated domain factors from this self-report measure (CEI), later referring to client ratings on Counseling Climate, Counselor Comfort, Client Satisfaction, etc. They clarified that the CEI was a measure of counselor effectiveness as judged by client ratings. No further information was provided.
The authors mentioned that the Depth of Self-Exploration Scale (SES) was an interval scale. However, there are no further details provided concerning the instrument or possible domain factors. They provided information (scant) concerning results, addressed later in this critique although they did mention that all subjects received the same tests and that the tests were administered promptly after the counseling sessions.
It may be that equivocating the dependent variable with scores is not "best practices." Rather, the authors could have suggested a characteristic to serve as a target of the data collection efforts.

5. Instrument, briefly comment on its reliability and validity
The authors provide information concerning a reliability coefficient (.83) for the CEI. They mention that this figure resulted from a total score test-retest procedure. The authors state that the reliability of the Self-Exploration Scale was reported, by the scale creators, to range from .59 to .88, based on 12 studies. Independent reports of reliability would have been more tenable. The authors reported the source of these reliability coefficients, but no further information.
Content validity for the CEI was reported to be "high" and the authors report research indicating that the CEI had high congruent or discriminative validity for "practicum grades to be significant at or beyond the .05 level for the total score on the CEI" (p. 445). What practicum grades had to do with this research study was not addressed. The authors reported that those who created the Depth of Self-Exploration Scale claimed that it had face validity. This supporting information was not convincing. There were no definitions supported by the literature indicating a consensus regarding the terms rated on the various evaluations.
Two further instruments were administered to the clients. The first was the Strong-Campbell Interest Inventory and the second was a questionnaire designed to determine the extent of client knowledge of the purpose and parameters of the study. No further information was provided.
Finally, a procedure compliance questionnaire was completed by each counselor immediately following the counseling session. This questionnaire was designed to determine the extent of counselor compliance with experimental procedures.
Unfortunately, there was no information provided concerning the psychometric properties of these measures and therefore, the validity and reliability of these instruments must be considered with caution. This serious omission must be kept in mind when considering the overall validity of the conclusions obtained as a result of these scores.

6. Experimental Procedure
It is in the area of describing the experimental procedure that these authors were quite thorough. The majority of the article is dedicated to this description. This is important because these authors note that although their experiment did not result in significant findings, they were able to apply experimental procedures in a naturalistic environment. Indeed, this is a difficult task, but one that should, in this case, have researchers from both laboratory and naturalistic positions cringing!
First, the authors describe the term "touch." For this study, touch was defined as a "squeeze" (at least 4-5 seconds of firm contact) between the hands and wrists of a counselor and the hands, arms, shoulders, and upper back area of a client.
Next, the authors describe the subjects. As mentioned, there were 100 clients (56 male and 44 female) and 25 counselors (14 male and 11 female). The clients were enrolled in an undergraduate education course. Already, the convenience sample sets the stage for violation of random selection. The authors write that the "subjects were selected randomly" and then asked to participate. No further information was provided. This omission leaves the reader questioning whom the students represent. The counselors were either enrolled in or had complete counseling practicum. (Could this be the same practicum that the scores on the Depth of Self-Exploration were in some way correlated with the counselor's grades?) There was no mention of method for selecting the counselors - again potentially violating random selection assumptions for these participants in the study.
Counselors were trained in the administration of the experimental procedures in one of 3 separate training sessions that were 60-90 minutes long. They then demonstrated their proficiency to implement the experimental procedure in a quasi-experimental counseling session. Two independent raters who had previously been trained in the procedures rated each session. They repeated training until they passed the training process.
After a 50-minute, audiotaped session with the counselor, the receptionist administered the instruments (CEI & Depth of Self Exploration Scales) to the clients. During the sessions, the interviews were highly structured, with a detailed agenda, including time to establish rapport, interpretation of the Strong-Campbell Interest Inventory (that clients had taken at some unmentioned point), opportunity for client self-exploration and integration, integration of session, and termination.
Counselors in the touch condition were given specific instructions. A sample of the very detailed instructions is, "On entering the reception area, walk over to the client and introduce yourself, extending your hand for a handshake. Maintain the handshake, eye contact, and a slight smile as you unhesitatingly complete your introduction (4-5 seconds). Be sure to maintain the distance of one arms length between yourself and the client" (p. 445). The remainder of the session procedure was equally as detailed and the instructions for the counselors in the no touch condition are identical, but with the "touch" omitted.
No mention was made concerning the length of time data collection lasted; whether all counseling sessions were presented during one school term or over a period of several terms and there was no mention of a debriefing process.
In addition to ratings by independent raters, three additional methods for checking internal validity were accomplished. This was designed to ensure counselor competence to correctly administer experimental procedures in a "natural / spontaneous" manner. The experimenter listened to the second half of all tapes and the first half of randomly selected tapes to ensure verbal procedures and timing was correctly accomplished. Although a video tape would have made much more sense since touch cannot be seen on an audio tape, and that is the main point for this experiment! Second, all clients completed an awareness questionnaire. While there was no mention of the psychometric properties of this questionnaire, the authors did include that the purpose of this questionnaire was to determine the extent of client knowledge of the purpose and parameters of the study. Finally, the third procedure designed to ensure internal validity was a procedure compliance questionnaire that was completed by each counselor immediately following the counseling session. This questionnaire was designed to determine the extent of counselor compliance with experimental procedures.

7. Statistical Analysis and conclusion
The authors report using a fixed effects model based on a 2 (nonverbal treatment) X 2 (sex of counselor) X 2 (sex of client) Randomized Block Factorial (RBF) design for this evaluation. Two ANOVA analyses were conducted on the dependent variables, the CEI and the Depth of Self-Exploration Scale.
There is little information presented concerning the statistical analysis, therefore, the reader has little to use when evaluating the study. There is only partial information provided concerning alpha levels, p-levels, trend analyses, ANOVA results, etc. Results on the CEI indicated that there was no significant main effect or interaction effect (p>.05). They report a possible trend (p<.078) toward higher levels of self-reported satisfaction by clients who were not touched, but not how this was determined. The authors wrote that they "weighted" scores on the CEI. With no details provided, this is scary.
Results on the Depth of Self-Exploration Scale revealed that interrator reliability on ratings assigned to audiotaped segments was high (.967). Female clients were judged to be significantly more self-exploratory than were the male clients (p<.05). Other effects and interactions were not significant (p<.05).
Interestingly, when describing results of the procedure compliance and awareness questionnaires, the authors commented that if the results indicated that the procedures were not followed or the clients were aware of what was going on, the data was "not subject to further evaluation". The above scant information is all that was reported.
Finally, there was no mention of the order for completion of the dependent variable measures. A step could have been added to the design to provide for a switched order of administration for each subject group. This carry-over effect leads to uncertainty regarding the source of any effect detected.
The authors write that contrary to previous research finding a significant Touch X Counselor Sex X Client Sex interaction (Alagna, et al, 1979), there were no statistically significant findings in this research. They write that Counselor Touch was not found to affect on scores obtained on the CEI or the Depth of Self-Exploration Scale. Although this study did not produce significant findings, (other than the one concerning female clients being more self-exploratory than male clients) it is notable that the authors attempted to implement a design allowing for experimental control in a naturalistic setting.
The authors suggest the reason for these conflicting results may be a result of a small sample size used in previous research (n=20). Perhaps, they write, the reason was that the subjects in previous research were all female or that only one counselor of each sex participated in the study. The further speculate, the puzzling results may be due to differences in the degree to which the experiment was controlled. They continue to offer several other possibilities, but it is impossible to evaluate these claims because the reader really has no clear idea how this study was planned, designed or analyzed, and therefore, the authors' interpretation for data generated in this study must be accepted with great caution, if at all. In all fairness, the authors did present thorough information regarding the procedure. And from this information, it would almost be possible to replicate the procedure.

8. If you were the researcher, how would you improve the study?
To begin, the purpose of an experiment is to obtain an answer to or insight about a specific research question or questions. To accomplish this, the research question(s) must be precise. It may be that the authors did generate a precise research question, but their reported design and strategy for analysis was severely deficit. It was impossible to determine their design beyond the statement that it was based on a randomized block design. A clue that it was not simply a randomized block design, as reported, is that the researchers reported an interaction effect. Another purpose for thoroughly reporting a study is to enable another researcher to replicate the study. This study could not be replicated in its entirety.
While the authors did not address their rationale for using a Randomized Block Design (RBD), the RBD provides a more powerful test (than the Completely Randomized Design) because it removes the effects of client sex from the estimate of the error variance. More specifically, the RBD partitions the total sum of squares into three parts: SSA, SSBLOCKS, and SSRESIDUALS. In general, the F-statistic for the RBD is greater than the F-statistic for the CR design and therefore results in a more powerful test of a false null hypothesis (Kirk, p. 252).
An important point to consider involved the relationship between a blocking variable and the dependent variable. Whenever a Randomized Block design is used, the blocking variable must be highly related to the dependent variable. Theoretical considerations must be given to the relationships. For this study, it seems sensible that Client Sex is more theoretically related to the two dependent variables than is Counselor Sex although the authors do not specifically state, or justify, the use of Client Sex as a nuisance variable.
Having mentioned this, it must be noted that the criteria for blocking was potentially violated. Kirk (p. 255) explains that when forming blocks, the object is to assign experimental units to blocks so that those in a given block are as similar as possible with respect to the dependent variable. There was no mention of this. Further, there was no subject matching repeated measures, homogeneity, or mutual selection matching. While it is agreed that the procedure used to form blocks has no effect on the computational procedures, the interpretation is affected. There was no mention of random assignment, and while the authors claim to have randomly selected the subjects, they were actually selected from a convenience sample of students who had no true need of vocational counseling.
It would have been helpful for the authors to explain how they determined required sample size. Estimating power (1-b) - the probability of rejecting a false null hypothesis is helpful when assessing the sensitivity of a statistical test and also for determining the sample size to use. The sample size might have been estimated from a pilot study (or the one previous study on this topic) although there is not information supplied by the authors suggesting that it was. Since there was a significant finding, (females are more self-exploratory than males) the strength of association between the independent and dependent variable should have been reported. Accounting for the appreciable portion of the variance in the dependent variable could have been calculated using partial omega squared. Effect Size could have been measured using Cohen's Measure of Effect Size computed from the partial omega squared. If Tang's charts had been used, and a power of .80 had been achieved, the sample size required would have been evident.
Since there is a tendency among researchers to underestimate the sample size required to obtain
practical significance, Kirk (p. 187) states, "The use of w squared or ¦ combined with Cohen's guidelines for interpreting values of w squared or ¦ requires the least amount of information and is the simplest" (p. 187). Kirk continues, "An estimate of sample size is necessary to detect effects that are practically significant and should always be made before an experiment is performed. If a sample size is too small, it may give a reduced chance of detecting treatment effects considered for practical significance. In this research this was a serious omission and rather confusing. It is confusing because the authors reported that the difference between their findings and those of the previous study (Alagna, et al, 1979) may have been due to small sample sizes. If small sample size was an issue, steps to ensure adequate sample size should have been important.
The authors, much to their credit, did refer to a "trend" in the data. However, no mention was made
of procedure and no visual representations were provided. Further, they did not specify whether the trend was linear, quadratic, or cubic. Nor did they refer to testing the "goodness of fit."
There are many threats to the validity and reliability of this study. The supporting, contextual information that could have clarified important issues was omitted, leaving critical issues unanswered. Without further information on these issues, the entire study must be viewed with great caution. First, the conclusions drawn in a research study are no better than the data on which they were based. The validity of the test instruments used to obtain the scores remains largely a mystery. There was no reference to either the reliability or validity of several of the administered instruments. Perhaps a Cronbach's alpha method could be used to assess the internal consistency of the instruments. Since the random sampling was nested in a convenience sample, and since there was no information concerning possible random assignment, it is difficult to know if the cell groups are truly homogenous, other than the consideration for sex, and this is but one variable to consider. This being said, it is therefore difficult to know whether test scores (and resulting conclusions) are due to true differences resulting from the treatments, or from differing groups, from unreliable / invalid tests, or a combination of all of these factors. There is no mention of instrument content validity, criterion-related validity, or construct validity. Construct validity can be assessed through factor analysis. Multiple approaches to assess instrument quality increases confidence that results obtained accurately represent what the researchers want to measure.
Rich description regarding the subjects was neglected. All that was included is that they were enrolled in an undergraduate education course, were "randomly" selected and then agreed to participate. Again, contextual information helps the reader to understand the population from which the samples were drawn and to which the results may be generalized. Demographics, previous educational experience, gender, and similar information is pertinent.
Random assignment is a particularly important distinction due to the assumptions of ANOVA: randomness, independence, normality, and homogeneity of variance. As pointed out in Huck, "the randomness and independence assumptions can ruin a study if they are violated" (p. 417). Having unequal numbers of subjects in each cell leads to loss of statistical power. Issues such as these require planning during the design phase of the experiment. It would be advisable for the researchers to concern themselves with the normality and homogeneity of variance assumptions. Hartley's F-max test for equal population variance could be used for this purpose. In the case of this study, the F-test may very well be biased, causing the F-test to be either too large or too small. If the F-test is too large, the computed p-value associated with a calculated F-value will be too small. When this occurs, the amount that the data deviates from the null is exaggerated and the alpha level will understate the probability of a Type-1 error. If the bias is negative, the p-values associated with the F-values will be too large, and the researcher may not reject a null hypothesis that would have been rejected if the p-value were unbiased (Huck, 419).
Uneven sample sizes, potentially unreliable measuring instruments, or the likelihood of too much within block (group) variability are some of the factors that could have affected the results. The authors did not address the restrictive assumption of sphericity, another area of concern. If the sphericity condition is not satisfied, conventional F tests will be positively biased. Since the variables only had two levels, there was no need to perform post-hoc comparisons. Because of the fundamental, numerous, and serious problems with this study, this study must be ignored or reaccomplished with attention to issues of planning, sampling, instrumentation, statistical analysis, and thoroughness of descriptive verbiage and visuals.
A final note… one lesson learned, is that when we publish research, it will still be available for others to read 20 years later. While this reason is but one for conducting excellent research, it serves as an "alert" to the importance of very carefully planned and implemented work. It is probably better to do no research than to be associated with work that is poorly planned, conducted, or reported. While it is possible that this study was very thoughtfully designed and carefully implemented, it is impossible to make that determination due to the sparse information provided through the article. Therefore, beyond the authors' innovative attempt to implement an experimental design in a naturalistic environment, this article is found to have little practical value.