Respond to the following in a minimum of 175 words:
A researcher is investigating verbal behavior among introverts and extroverts. The researcher first tests participants and classifies them as either introverted or extroverted. She then randomly assigns participants to a high anxiety or a low anxiety social group and observes the number of words spoken by participants. The researcher wants to learn more about the effects of both personality style—introversion or extroversion—and social situation—high or low anxiety—on verbal behavior. In this case, verbal behavior is defined as the number of verbal communications uttered within a 5-minute observation period.
What type of design is this? Provide an example of a possible main effect that might emerge from this study. Additionally, provide an example of a possible interaction that might emerge
PART2- PLEASE SEE ATTACHMANENT!!! Select a research article of interest to you, preferably related to your Research Proposal, and use the Research Evaluation Worksheet to analyze the article. You can use this information to help form the literature review section of your research proposal.
PART3-PLEASE SEE PART 3 ATTACHMENT!!!Complete the Week Four Homework Exercise.
PART4-I AM ONLY RESPONSIBLE FOR ONE PART. I WILL POST ALL DETAILS ONCE THE TEAM LEADER NOTIFIES US ON WHAT PART WE ARE RESPONSIBLE FOR AND THE ARTICLES…
As a team, locate at least two surveys (you can use any survey that you find on the internet). Try to find one that is relatively brief —10 questions or less. Analyze the questions in the survey. Construct a table and evaluate each survey question on the following points:
- negative wording
- complexity (note: good questions are simple and straightforward)
- grammatically incorrect
- In your team, discuss the importance of writing good survey questions. How can poorly-written questions bias results? Submit both the table that you constructed as well as a copy of the survey you analyzed to your instructor.
- Define confounding variable, and describe how confounding variables are related to internal validity.
- Describe the posttest-only design and the pretest-posttest design, including the advantages and disadvantages of each design.
- Contrast an independent groups (between-subjects) design with a repeated measures (within-subjects) design.
- Summarize the advantages and disadvantages of using a repeated measures design.
- Describe a matched pairs design, including reasons to use this design.
Page 162IN THE EXPERIMENTAL METHOD, THE RESEARCHER ATTEMPTS TO CONTROL ALL EXTRANEOUS VARIABLES. Suppose you want to test the hypothesis that exercise affects mood. To do this, you might put one group of people through a 1-hour aerobics workout and put another group in a room where they are asked to watch a video of people exercising for an hour. All participants would then complete the same mood assessment. Now suppose that the people in the aerobics class rate themselves as happier than those in the video viewing condition. Can the difference in mood be attributed to the difference in the exercise? Yes, if there is no other difference between the groups. However, what if the aerobics group was given the mood assessment in a room with windows but the video-only group was tested in a room without windows? In that case, it would be impossible to know whether the better mood of the participants in the aerobics group was due to the exercise or to the presence of windows.
CONFOUNDING AND INTERNAL VALIDITY
Recall from Chapter 4 that the experimental method has the advantage of allowing a relatively unambiguous interpretation of results. The researcher manipulates the independent variable to create groups and then compares the groups on the dependent variable. All other variables are kept constant, either through direct experimental control or through randomization. If the groups are different, the researcher can conclude that the independent variable caused the results because the only difference between the groups is the manipulated variable.
Although the task of designing an experiment is logically elegant and exquisitely simple, you should be aware of possible pitfalls. In the hypothetical exercise experiment just described, the variables of exercise and window presence are confounded. The window variable was not kept constant. A confounding variable is a variable that varies along with the independent variable; confounding occurs when the effects of the independent variable and an uncontrolled variable are intertwined so you cannot determine which of the variables is responsible for the observed effect. If the window variable had been held constant, both the exercise and the video condition would have taken place in identical rooms. That way, the effect of windows would not be a factor to consider when interpreting the difference between the groups.
In short, both rooms in the exercise experiment should have had windows or both should have been windowless. Because one room had windows and one room did not, any difference in the dependent variable (mood) cannot be attributed solely to the independent variable (exercise). An alternative explanation can be offered: The difference in mood may have been caused, at least in part, by the window variable.
Good experimental design requires eliminating possible confounding variables that could result in alternative explanations. A researcher can claim that the independent variable caused the results only by eliminating competing, Page 163alternative explanations. When the results of an experiment can confidently be attributed to the effect of the independent variable, the experiment is said to have internal validity (remember that internal validity refers to the ability to draw conclusions about causal relationships from our data; see Chapter 4). To achieve good internal validity, the researcher must design and conduct the experiment so that only the independent variable can be the cause of the results (Campbell & Stanley, 1966).
This chapter will focus on true experimental designs, which provide the highest degree of internal validity. In Chapter 11, we will turn to an examination of quasi-experimental designs, which lack the crucial element of random assignment while attempting to infer that an independent variable had an effect on a dependent variable. Internal validity is discussed further in Chapter 11 and external validity, the extent to which findings may be generalized, is the focus of Chapter 14.
The simplest possible experimental design has two variables: the independent variable and the dependent variable. The independent variable has a minimum of two levels, an experimental group and a control group. Researchers must make every effort to ensure that the only difference between the two groups is the manipulated (independent) variable.
Remember, the experimental method involves control over extraneous variables, through either keeping such variables constant (experimental control) or using randomization to make sure that any extraneous variables will affect both groups equally. The basic, simple experimental design can take one of two forms: a posttest-only design or a pretest-posttest design.
A researcher using a posttest-only design must (1) obtain two equivalent groups of participants, (2) introduce the independent variable, and (3) measure the effect of the independent variable on the dependent variable. The design looks like this:
Thus, the first step is to choose the participants and assign them to the two groups. The procedures used must achieve equivalent groups to eliminate any Page 164potential selection differences: The people selected to be in the conditions cannot differ in any systematic way. For example, you cannot select high-income individuals to participate in one condition and low-income individuals for the other. The groups can be made equivalent by randomly assigning participants to the two conditions or by having the same participants participate in both conditions. Recall from Chapter 4 that random assignment is done in such a way that each participant is assigned to a condition randomly without regard to any personal characteristic of the individual. The R in the diagram means that participants were randomly assigned to the two groups.
Next, the researcher must choose two levels of the independent variable, such as an experimental group that receives a treatment and a control group that does not. Thus, a researcher might study the effect of reward on motivation by offering a reward to one group of children before they play a game and offering no reward to children in the control group. A study testing the effect of a treatment method for reducing smoking could compare a group that receives the treatment with a control group that does not. Another approach would be to use two different amounts of the independent variable—that is, to use more reward in one group than the other or to compare the effects of different amounts of relaxation training designed to help people quit smoking (e.g., 1 hour of training compared with 10 hours). Another approach would be to include two qualitatively different conditions; for example, one group of test-anxious students might write about their anxiety and the other group could participate in a meditation exercise prior to a test. All of these approaches would provide a basis for comparison of the two groups. (Of course, experiments may include more than two groups; for example, we might compare two different smoking cessation treatments along with a no-treatment control group—these types of experimental designs will be described in Chapter 10).
Finally, the effect of the independent variable is measured. The same measurement procedure is used for both groups, so that comparison of the two groups is possible. Because the groups were equivalent prior to the introduction of the independent variable and there were no confounding variables, any difference between the groups on the dependent variable must be attributed to the effect of the independent variable. This elegant experimental design has a high degree of internal validity. That is, we can confidently conclude that the independent variable caused the dependent variable. In actuality, a statistical significance test would be used to assess the difference between the groups. However, we do not need to be concerned with statistics at this point. An experiment must be well designed, and confounding variables must be eliminated before we can draw conclusions from statistical analyses.
The only difference between the posttest-only design and the pretest-posttest design is that in the latter a pretest is given before the experimental manipulation is introduced:
This design makes it possible to ascertain that the groups were, in fact, equivalent at the beginning of the experiment. However, this precaution is usually not necessary if participants have been randomly assigned to the two groups. With a sufficiently large sample of participants, random assignment will produce groups that are virtually identical in all respects.
You are probably wondering how many participants are needed in each group to make sure that random assignment has made the groups equivalent. The larger the sample, the less likelihood there is that the groups will differ in any systematic way prior to the manipulation of the independent variable. In addition, as sample size increases, so does the likelihood that any difference between the groups on the dependent variable is due to the effect of the independent variable. There are formal procedures for determining the sample size needed to detect a statistically significant effect, but as a rule of thumb you will probably need a minimum of 20 to 30 participants per condition. In some areas of research, many more participants may be necessary. Further issues in determining the number of participants needed for an experiment are described in Chapter 13.
Comparing Posttest-Only and Pretest-Posttest Designs
Each of these two experimental designs has advantages and disadvantages that influence the decision whether to include or omit a pretest. The first decision factor concerns the equivalence of the groups in the experiment. Although randomization is likely to produce equivalent groups, it is possible that, with small sample sizes, the groups will not be equal. Thus, a pretest enables the researcher to assess whether the groups are in fact equivalent to begin with.
Sometimes, a pretest is necessary to select the participants in the experiment. A researcher might need to give a pretest to find the lowest or highest scorers on a smoking measure, a math anxiety test, or a prejudice measure. Once identified, the participants would be randomly assigned to the experimental and control groups.
The pretest-posttest design immediately makes us focus on the change from pretest to posttest. This emphasis on change is incorporated into the analysis of the group differences. Also, the extent of change in each individual can be Page 166examined. If a smoking reduction program appears to be effective for some individuals but not others, attempts can be made to find out why.
A pretest is also necessary whenever there is a possibility that participants will drop out of the experiment; this is most likely to occur in a study that lasts over a long time period. The dropout factor in experiments is called attrition or mortality. People may drop out for reasons unrelated to the experimental manipulation, such as illness; sometimes, however, attrition is related to the experimental manipulation. Even if the groups are equivalent to begin with, different attrition rates can make them nonequivalent. How might mortality affect a treatment program designed to reduce smoking? One possibility is that the heaviest smokers in the experimental group might leave the program. Therefore, when the posttest is given, only the light smokers would remain, so that a comparison of the experimental and control groups would show less smoking in the experimental group even if the program had no effect. In this way, attrition (mortality) becomes an alternative explanation for the results. Use of a pretest enables you to assess the effects of attrition; you can look at the pretest scores of the dropouts and know whether their scores differed from the scores of the individuals completing the study. Thus, with the pretest, it is possible to examine whether attrition is a plausible alternative explanation—an advantage in the experimental design.
One disadvantage of a pretest, however, is that it may be time-consuming and awkward to administer in the context of the particular experimental procedures being used. Perhaps most important, a pretest can sensitize participants to what you are studying, enabling them to figure out what is being studied and (potentially) why. They may then react differently to the manipulation than they would have without the pretest. When a pretest affects the way participants react to the manipulation, it is very difficult to generalize the results to people who have not received a pretest. That is, the independent variable may not have an effect in the real world, where pretests are rarely given. We will examine this issue more fully in Chapter 14.
If awareness of the pretest is a problem, the pretest can be disguised. One way to do this is by administering it in a completely different situation with a different experimenter. Another approach is to embed the pretest in a set of irrelevant measures so it is not obvious that the researcher is interested in a particular topic.
It is also possible to assess the impact of the pretest directly with a combination of both the posttest-only and the pretest-posttest design. In this design, half the participants receive only the posttest, and the other half receive both the pretest and the posttest (see Figure 8.1). This is formally called a Solomon four-group design. If there is no impact of the pretest, the posttest scores will be the same in the two control groups (with and without the pretest) and in the two experimental groups. Garvin and Damson (2008) employed a Solomon four-group design to study the effect of viewing female fitness magazine models on a measure of depressed mood. Female college students spent 30 minutes viewing either the fitness magazines or magazines such as National Geographic. Two possible outcomes of this study are shown in Figure 8.2. The top graph illustrates an outcome in which the pretest has no impact: The fitness magazine viewing results in higher depression in both the posttest-only and the pretest-posttest condition. This is what was found in the study. The lower graph shows an outcome in which there is a difference between the treatment and control groups when there is a pretest, but there is no group difference when the pretest is absent.
Solomon four-group design
Examples of outcomes of Solomon four-group design
ASSIGNING PARTICIPANTS TO EXPERIMENTAL CONDITIONS
Recall that there are two basic ways of assigning participants to experimental conditions. In one procedure, participants are randomly assigned to the various conditions so that each participates in only one group. This is called an independent groups design. It is also known as a between-subjects design because comparisons are made between different groups of participants. In the other procedure, participants are in all conditions. In an experiment with two conditions, for example, each participant is assigned to both levels of the independent variable. This is called a repeated measures design, because each participant is measured after receiving each level of the independent variable. You will also see this called a within-subjects design; in this design, comparisons are made within the same group of participants (subjects). In the next two sections, we will examine each of these designs in detail.
INDEPENDENT GROUPS DESIGN
In an independent groups design, different participants are assigned to each of the conditions using random assignment. This means that the decision to assign an individual to a particular condition is completely random and beyond the control of the researcher. For example, you could ask for the participant’s month of birth; individuals born in odd-numbered months would be assigned to one group and those born in even-numbered months would be assigned to the other group. In practice, researchers use a sequence of random numbers to determine assignment. Such numbers come from a random number generator such as Research Randomizer, available online at http://www.randomizer.org or QuickCalcs at http://www.graphpad.com/quickcalcs/randomize1.cfm; Excel can also generate random numbers. These programs allow you to randomly determine the assignment of each participant to the various groups in your study. Random assignment will prevent any systematic biases, and the groups can be considered equivalent in terms of participant characteristics such as income, intelligence, age, personality, and political attitudes. In this way, participant differences cannot be an explanation for results of the experiment. As we noted in Chapter 4, in an experiment on the effects of exercise on anxiety, lower levels of Page 169anxiety in the exercise group than in the no-exercise group cannot be explained by saying that people in the groups are somehow different on characteristics such as income, education, or personality.
An alternative procedure is to have the same individuals participate in all of the groups. This is called a repeated measures experimental design.
REPEATED MEASURES DESIGN
Consider an experiment investigating the relationship between the meaningfulness of material and the learning of that material. In an independent groups design, one group of participants is given highly meaningful material to learn and another group receives less meaningful material. For example, the meaningful material might include a story relating the material to a real-life event. In a repeated measures design, the same individuals participate in both conditions. Thus, participants might first read low-meaningful material and take a recall test to measure learning; the same participants would then read high-meaningful material and take the recall test. You can see why this is called a repeated measures design; participants are repeatedly measured on the dependent variable after being in each condition of the experiment.
Advantages and Disadvantages of Repeated Measures Design
The repeated measures design has several advantages. An obvious one is that fewer research participants are needed, because each individual participates in all conditions. When participants are scarce or when it is costly to run each individual in the experiment, a repeated measures design may be preferred. In much research on perception, for instance, extensive training of participants is necessary before the actual experiment can begin. Such research often involves only a few individuals who participate in all conditions of the experiment.
An additional advantage of repeated measures designs is that they are extremely sensitive to finding statistically significant differences between groups. This is because we have data from the same people in both conditions. To illustrate why this is important, consider possible data from the recall experiment. Using an independent groups design, the first three participants in the high-meaningful condition had scores of 68, 81, and 92. The first three participants in the low-meaningful condition had scores of 64, 78, and 85. If you calculated an average score for each condition, you would find that the average recall was a bit higher when the material was more meaningful. However, there is a lot of variability in the scores in both groups. You certainly are not finding that everyone in the high-meaningful condition has high recall and everyone in the other condition has low recall. The reason for this variability is that people differ—there are individual differences in recall abilities, so there is a range of scores in both conditions. This is part of “random error” in the scores that we cannot explain.
Page 170However, if the same scores were obtained from the first three participants in a repeated measures design, the conclusions would be much different. Let’s line up the recall scores for the two conditions:
With a repeated measures design, the individual differences can be seen and explained. It is true that some people score higher than others because of individual differences in recall abilities, but now you can much more clearly see the effect of the independent variable on recall scores. It is much easier to separate the systematic individual differences from the effect of the independent variable: Scores are higher for every participant in the high-meaningful condition. As a result, we are much more likely to detect an effect of the independent variable on the dependent variable.
The major problem with a repeated measures design stems from the fact that the different conditions must be presented in a particular sequence. Suppose that there is greater recall in the high-meaningful condition. Although this result could be caused by the manipulation of the meaningfulness variable, the result could also simply be an order effect—the order of presenting the treatments affects the dependent variable. Thus, greater recall in the high-meaningful condition could be attributed to the fact that the high-meaningful task came second in the order of presentation of the conditions. Performance on the second task might improve merely because of the practice gained on the first task. This improvement is in fact called a practice effect, or learning effect. It is also possible that a fatigue effect could result in a deterioration in performance from the first to the second condition as the research participant becomes tired, bored, or distracted.
It is also possible for the effect of the first treatment to carry over to influence the response to the second treatment—this is known as a carryover effect. Suppose the independent variable is severity of a crime. After reading about the less severe crime, the more severe one might seem much worse to participants than it normally would. In addition, reading about the severe crime might subsequently cause participants to view the less severe crime as much milder than they normally would. In both cases, the experience with one condition carried over to affect the response to the second condition. In this example, the carryover effect was a psychological effect of the way that the two situations contrasted with one another.
A carryover effect may also occur when the first condition produces a change that is still influencing the person when the second condition is introduced. Suppose the first condition involves experiencing failure at an important Page 171task. This may result in a temporary increase in stress responses. How long does it take before the person returns to a normal state? If the second condition is introduced too soon, the stress may still be affecting the participant.
There are two approaches to dealing with order effects. The first is to employ counterbalancing techniques. The second is to devise a procedure in which the interval between conditions is long enough to minimize the influence of the first condition on the second.
Complete counterbalancing In a repeated measures design, it is very important to counterbalance the order of the conditions. With complete counterbalancing, all possible orders of presentation are included in the experiment. In the example of a study on learning high- and low-meaningful material, half of the participants would be randomly assigned to the low-high order, and the other half would be assigned to the high-low order. This design is illustrated as follows:
By counterbalancing the order of conditions, it is possible to determine the extent to which order is influencing the results. In the hypothetical memory study, you would know whether the greater recall in the high-meaningful condition is consistent for both orders; you would also know the extent to which a practice effect is responsible for the results.
Counterbalancing principles can be extended to experiments with three or more groups. With three groups, there are 6 possible orders (3! = 3 × 2 × 1 = 6). With four groups, the number of possible orders increases to 24 (4! = 4 × 3 × 2 × 1 = 24); you would need a minimum of 24 participants to represent each order, and you would need 48 participants to have only two participants per order. Imagine the number of orders possible in an experiment by Shepard and Metzler (1971). In their basic experimental paradigm, each participant is shown a three-dimensional object along with the same figure rotated at one of 10 different angles ranging from 0 degrees to 180 degrees (see the sample objects illustrated in Figure 8.3). Each time, the participant presses a button when it is determined that the two figures are the same or different. The dependent variable is reaction time—the amount of time it takes to decide whether the figures are the same or different. The results show that reaction time becomes longer as the angle of rotation increases away from the original. In this experiment with 10 conditions, there are 3,628,800 possible orders! Fortunately, there are alternatives to complete counterbalancing that still allow researchers to draw valid conclusions about the effect of the independent variable without running some 3.6 million tests.
Three-dimensional figures used by Shepard and Metzler (1971)
Adapted from “Mental Rotation of Three-Dimensional Objects,” by R. N. Shepard and J. Metzler, 1971, Science, 171, pp. 701–703.
Latin squares A technique to control for order effects without having all possible orders is to construct a Latin square: a limited set of orders constructed to ensure that (1) each condition appears at each ordinal position and (2) each condition precedes and follows each condition one time. Using a Latin square to determine order controls for most order effects without having to include all possible orders. Suppose you replicated the Shepard and Metzler (1971) study using only 4 of the 10 rotations: 0, 60, 120, and 180 degrees. A Latin square for these four conditions is shown in Figure 8.4. Each row in the square is one of the orders of the conditions (the conditions are labeled A, B, C, and D). The number of orders in a Latin square is equal to the number of conditions; thus, if there are four conditions, there are four orders. When you conduct your study using the Latin square to determine order, you need at least one participant per row. Usually, you will have two or more participants per row; the number of participants tested in each order must be equal.
A Latin square with four conditions
Note: The four conditions were randomly given letter designations. A = 60 degrees, B = 0 degrees, C = 180 degrees, and D = 120 degrees. Each row represents a different order of running the conditions.
Time Interval Between Treatments
In addition to counterbalancing the order of treatments, researchers need to carefully determine the time interval between presentation of treatments and possible activities between them. A rest period may counteract a fatigue effect; attending to an unrelated task between treatments may reduce the possibility that participants will contrast the first treatment with the second. If the treatment is the administration of a drug that takes time to wear off, the interval between treatments may have to be a day or more. Lane, Cherek, Tcheremissine, Lieving, and Pietras (2005) used a repeated measures design to study the effect of marijuana on risk taking. The subjects came the lab in the morning and passed a drug test. They were then given one of three marijuana doses. The dependent variable was a measure of risk taking. Subjects were tested in this way for each dosage. Because of the time necessary for the effects of the drug to wear off, the three conditions were run on separate days at least five days apart. A similarly long time interval would be needed with procedures that produce emotional changes, such as heightened anxiety or anger. You may have noted that introduction of an extended time interval may create a separate problem: Participants will have to commit to the experiment for a longer period of time. This can make it more difficult to recruit volunteers, and if the study extends over two or more days, some participants may drop out of the experiment altogether. And for the record, increased marijuana doses did result in making riskier decisions.
Choosing Between Independent Groups and Repeated Measures Designs
Repeated measures designs have two major advantages over independent groups designs: (1) a reduction in the number of participants required to complete the experiment and (2) greater control over participant differences and thus greater ability to detect an effect of the independent variable. As noted previously, in certain areas of research, these advantages are very important. However, the disadvantages of repeated measures designs and the precautions required to deal with them are usually sufficient reasons for researchers to use independent groups designs.
A very different consideration in whether to use a repeated measures design concerns generalization to conditions in the “real world.” Greenwald (1976) has pointed out that in actual everyday situations, we sometimes encounter independent variables in an independent groups fashion: We encounter only Page 174one condition without a contrasting comparison. However, some independent variables are most frequently encountered in a repeated measures fashion: Both conditions appear, and our responses occur in the context of exposure to both levels of the independent variable. Thus, for example, if you are interested in how a defendant’s characteristics affects jurors, an independent groups design may be most appropriate because actual jurors focus on a single defendant in a trial. However, if you are interested in the effects of a job applicant’s characteristics on employers, a repeated measures design would be reasonable because employers typically consider several applicants at once. Whether to use an independent groups or repeated measures design may be partially determined by these generalization issues.
Finally, any experimental procedure that produces a relatively permanent change in an individual cannot be used in a repeated measures design. Examples include a psychotherapy treatment or a surgical procedure such as the removal of brain tissue.
MATCHED PAIRS DESIGN
A somewhat more complicated method of assigning participants to conditions in an experiment is called a matched pairs design. Instead of simply randomly assigning participants to groups, the goal is to first match people on a participant variable such as age or a personality trait (see Chapter 4). The matching variable will be either the dependent measure or a variable that is strongly related to the dependent variable. For example, in a learning experiment, participants might be matched on the basis of scores on a cognitive ability measure or even grade point average. If cognitive ability is not related to the dependent measure, however, matching would be a waste of time. The goal is to achieve the same equivalency of groups that is achieved with a repeated measures design without the necessity of having the same participants in both conditions. The design looks like this:
When using a matched pairs design, the first step is to obtain a measure of the matching variable from each individual. The participants are then rank ordered from highest to lowest based on their scores on the matching variable. Now the researcher can form matched pairs that are approximately equal on the characteristic (the highest two participants form the first pair, the next Page 175two form the second pair, and so on). Finally, the members of each pair are randomly assigned to the conditions in the experiment. (Note that there are methods of matching pairs of individuals on the basis of scores derived from multiple variables; these methods are described briefly in Chapter 11.)
A matched pairs design ensures that the groups are equivalent (on the matching variable) prior to introduction of the independent variable manipulation. This assurance could be particularly important with small sample sizes, because random assignment procedures are more likely to produce equivalent groups as the sample size increases. Matching, then, is most likely to be used when only a few participants are available or when it is very costly to run large numbers of individuals in the experiment—as long as there is a strong relationship between a dependent measure and the matching variable. The result is a greater ability to detect a statistically significant effect of the independent variable because it is possible to account for individual differences in responses to the independent variable, just as we saw with a repeated measures design. (The issues of variability and statistical significance are discussed further in Chapter 13 and Appendix C.)
However useful they are, matching procedures can be costly and time-consuming, because they require measuring participants on the matching variable prior to the experiment. Such efforts are worthwhile only when the matching variable is strongly related to the dependent measure and you know that the relationship exists prior to conducting your study. For these reasons, matched pairs is not a commonly used experimental design. However, we will discuss matching again in Chapter 11 when describing quasi-experimental designs that do not have random assignment to conditions. You now have a fundamental understanding of the design of experiments. In the next chapter, we will consider issues that arise when you decide how to actually conduct an experiment.
ILLUSTRATIVE ARTICLE: EXPERIMENTAL DESIGN
We are constantly connected. We can be reached by cell phone almost anywhere, at any time. Text messages compete for our attention. Email and instant messaging (IM) can interrupt our attention whenever we are using a cell phone or computer. Is this a problem? Most people like to think of themselves as experts at multitasking. Is that true?
A study conducted by Bowman, Levine, Waite, and Gendron (2010) attempted to determine whether IMing during a reading session affected test performance. In this study, participants were randomly assigned to one of three conditions: one where they were asked to IM prior to reading, one in which they were asked to IM during reading, and one in which IMing was not allowed at all. Afterward, all participants completed a brief test on the material presented in the reading.
First, acquire and read the article:Page 176
Bowman, L. L., Levine, L. E., Waite, B. M., & Gendron, M. (2010). Can students really multitask? An experimental study of instant messaging while reading. Computers & Education, 54, 927–931. doi:10.1016/j.compedu.2009.09.024
After reading the article, answer the following questions:
1. This experiment used a posttest-only design. How could the researchers have used a pretest-posttest design? What would the advantages and disadvantages be of using a pretest-posttest design?
2. This experiment used an independent groups design.
a. How could they have used a repeated measures design? What would have been the advantages and disadvantages of using a repeated measures design?
b. How could they have used a matched pairs design? What variables do you think would have been worthwhile to match participants on? What would have been the advantages and disadvantages of using a matched pairs design?
3. What potential confounding variables can you think of?
4. In what way does this study reflect—or not reflect—the reality of studying and test taking in college? That is, how would you evaluate the external validity of this study?
5. How good was the internal validity of this experiment?
6. What were the researchers’ key conclusions of this experiment?
7. Would you have predicted the results obtained in this experiment? Why or why not?
Attrition (also mortality) (p. 166)
Between-subjects design (also independent groups design) (p. 168)
Carryover effect (p. 170)
Confounding variable (p. 162)
Counterbalancing (p. 171)
Fatigue effect (p. 170)
Independent groups design (also between-subjects design) (p. 168)
Internal validity (p. 163)
Latin square (p. 172)
Matched pairs design (p. 174)
Mortality (also attrition) (p. 166)
Order effect (p. 170)
Posttest-only design (p. 163)
Practice effect (also learning effect) (p. 170)
Pretest-posttest design (p. 164)Page 177
Random assignment (p. 168)
Repeated measures design (also within-subjects design) (p. 168)
Selection differences (p. 164)
Solomon four-group design (p. 166)
Within-subjects design (also repeated measures design) (p. 168)
- Distinguish between straightforward and staged manipulations of an independent variable.
- Describe the three types of dependent variables: self-report, behavioral, and physiological.
- Discuss sensitivity of a dependent variable, contrasting floor effects and ceiling effects.
- Describe ways to control participant expectations and experimenter expectations.
- List the reasons for conducting pilot studies.
- Describe the advantages of including a manipulation check in an experiment.
THE PREVIOUS CHAPTERS HAVE LAID THE FOUNDATION FOR PLANNING A RESEARCH INVESTIGATION. In this chapter, we will focus on some very practical aspects of conducting research. How do you select the research participants? What should you consider when deciding how to manipulate an independent variable? What should you worry about when you measure a variable? What do you do when the study is completed?
SELECTING RESEARCH PARTICIPANTS
The focus of your study may be children, college students, elderly adults, employees, rats, pigeons, or even cockroaches or flatworms; in all cases, the participants or subjects must somehow be selected. The method used to select participants can have a profound impact on external validity. Remember that external validity is defined as the extent to which results from a study can be generalized to other populations and settings.
Most research projects involve sampling research participants from a population of interest. The population is composed of all of the individuals of interest to the researcher. Samples may be drawn from the population using probability sampling or nonprobability sampling techniques. When it is important to accurately describe the population, you must use probability sampling. This is why probability sampling is so crucial when conducting scientific polls. Much research, on the other hand, is more interested in testing hypotheses about behavior: attempting to detect whether X causes Y rather than describing a population. Here, the two focuses of the study are the relationships between the variables being studied and tests of predictions derived from theories of behavior. In such cases, the participants may be found in the easiest way possible using nonprobability sampling methods, also known as haphazard or “convenience” methods. You may ask students in introductory psychology classes to participate, knock on doors in your dorm to find people to be tested, or choose a class in which to test children simply because you know the teacher. Nothing is wrong with such methods as long as you recognize that they affect the ability to generalize your results to some larger population. In Chapter 14, we examine the issues of generalizing from the rather atypical samples of college students and other conveniently obtained research participants.
You will also need to determine your sample size. How many participants will you need in your study? In general, increasing your sample size increases the likelihood that your results will be statistically significant, because larger samples provide more accurate estimates of population values (see Table 7.2, p. 149). Most researchers take note of the sample sizes in the research area being studied and select a sample size that is typical for studies in the area. A more formal approach to selecting a sample size, called power analysis, is discussed in Chapter 13.
MANIPULATING THE INDEPENDENT VARIABLE
To manipulate an independent variable, you have to construct an operational definition of the variable (see Chapter 4). That is, you must turn a conceptual variable into a set of operations—specific instructions, events, and stimuli to be presented to the research participants. The manipulation of the independent variable, then, is when a researcher changes the conditions to which participants are exposed. In addition, the independent and dependent variables must be introduced within the context of the total experimental setting. This has been called setting the stage (Aronson, Brewer, & Carlsmith, 1985).
Setting the Stage
In setting the stage, you usually have to supply the participants with the information necessary for them to provide their informed consent to participate (informed consent is covered in Chapter 3). This generally includes information about the underlying rationale of the study. Sometimes, the rationale given is completely truthful, although only rarely will you want to tell participants the actual hypothesis. For example, you might say that you are conducting an experiment on memory when, in fact, you are studying a specific aspect of memory (your independent variable). If participants know what you are studying, they may try to confirm (or even deny) the hypothesis, or they may try to look good by behaving in the most socially acceptable way. If you find that deception is necessary, you have a special obligation to address the deception when you debrief the participants at the conclusion of the experiment.
There are no clear-cut rules for setting the stage, except that the experimental setting must seem plausible to the participants, nor are there any clear-cut rules for translating conceptual variables into specific operations. Exactly how the variable is manipulated depends on the variable and the cost, practicality, and ethics of the procedures being considered.
Types of Manipulations
Straightforward manipulations Researchers are usually able to manipulate an independent variable with relative simplicity by presenting written, verbal, or visual material to the participants. Such straightforward manipulations manipulate variables with instructions and stimulus presentations. Stimuli may be presented verbally, in written form, via videotape, or with a computer. Let’s look at a few examples.
Goldstein, Cialdini, and Griskevicius (2008) were interested in the influence of signs that hotels leave in their bathrooms encouraging guests to reuse their towels. In their research, they simply printed signs that were hooked on towel shelves in the rooms of single guests staying at least two nights. In a standard message, the sign read “HELP SAVE THE ENVIRONMENT. You can show your respect of nature and help save the environment by reusing Page 182towels during your stay.” In this case, 35% of the guests reused their towels on the second day. Another condition invoked a social norm that other people are reusing towels: “JOIN YOUR FELLOW GUESTS IN HELPING TO SAVE THE ENVIRONMENT. Almost 75% of guests who are asked to participate in our new resource savings program do help by using their towels more than once. You can join your fellow guests in this program to save the environment by reusing your towels during your stay.” This sign resulted in 44% reusing their towels. As you might expect, the researchers have extended this research to study ways that the sign can be even more effective in increasing conservation.
Most memory research relies on straightforward manipulations. For example, Coltheart and Langdon (1998) displayed lists of words to participants and later measured recall. The word lists differed on phonological similarity: Some lists had words that sounded similar, such as cat, map, and pat, and other lists had dissimilar words such as mop, pen, and cow. They found that lists with dissimilar words are recalled more accurately.
Educational programs are most often straightforward. Pawlenko, Safer, Wise, and Holfeld (2013) examined the effectiveness of three training programs designed to improve jurors’ ability to evaluate eyewitness testimony. Subjects viewed one of three 15-minute slide presentations on a computer screen. The Interview-Identification-Eyewitness training focused on three steps to analyze eyewitness evidence: Ask if the eyewitness interviews were done properly, ask if identification methods were proper, and evaluate if the conditions of the crime scene allowed for an accurate identification. A second presentation termed “Biggers training” was a presentation of five eyewitness factors that the Supreme Court determined should be used (developed in a case called Neil v. Biggers). The Jury Duty presentation was a summary of standard information provided to jurors such as the need to be fair and impartial and the importance of hearing all evidence before reaching a verdict. After viewing the presentations, subjects read a trial transcript that included problems with the eyewitness identification procedures. The subjects in the Interview-Identification-Eyewitness conditions were most likely to use these problems in reaching a verdict.
As a final example of a straightforward manipulation, consider a study by Mazer, Murphy, and Simonds (2009) on the effect of college teacher self-disclosure (via Facebook) on perceptions of teacher effectiveness. For this study, students read one of three Facebook profiles that were created for a volunteer teacher, one for each of the high-, medium-, and low-disclosure conditions. Level of disclosure was manipulated by changing the number and nature of photographs, biographical information, favorite movies/books/quotes, campus groups, and posts on “the wall.” After viewing the profile to which they were assigned, participants rated the teacher on several dimensions. Higher disclosure resulted in perceptions of greater caring and trustworthiness; however, disclosure was not related to perceptions of teacher competence.
You will find that most manipulations of independent variables in all areas of research are straightforward. Researchers vary the difficulty of material to Page 183be learned, motivation levels, the way questions are asked, characteristics of people to be judged, and a variety of other factors in a straightforward manner.
Staged manipulations Other manipulations are less straightforward. Sometimes, it is necessary to stage events during the experiment in order to manipulate the independent variable successfully. When this occurs, the manipulation is called a staged manipulation or event manipulation.
Staged manipulations are most frequently used for two reasons. First, the researcher may be trying to create some psychological state in the participants, such as frustration, anger, or a temporary lowering of self-esteem. For example, Zitek and her colleagues studied what is termed a sense of entitlement (Zitek, Jordan, Monin, & Leach, 2010). Their hypothesis is that the feeling of being unfairly wronged leads to a sense of entitlement and, as a result, the tendency to be more selfish with others. In their study, all participants played a computer game. The researchers programmed the game so that some participants would lose when the game crashed. This is an unfair outcome, because the participants lost for no good reason. Participants in the other condition also lost, but they thought it was because the game itself was very difficult. The participants experiencing the broken game did in fact behave more selfishly after the game; they later allocated themselves more money than deserved when competing with another participant.
Second, a staged manipulation may be necessary to simulate some situation that occurs in the real world. Recall the Milgram obedience experiment that was described in Chapter 3. In that study, an elaborate procedure—ostensibly to study learning—was constructed to actually study obedience to an authority. Or consider a study on computer multitasking conducted by Bowman, Levine, Waite, and Gendron (2010), wherein students read academic material presented on a computer screen. In one condition, the participants received and responded to instant messages while they were reading. Other participants did not receive any messages. Student performance on a test was equal in the two conditions. However, students in the instant message condition took longer to read the material (after the time spent on the message was subtracted from the total time working on the computer).
Staged manipulations frequently employ a confederate (sometimes termed an “accomplice”). Usually, the confederate appears to be another participant in an experiment but is actually part of the manipulation (we discussed the use of confederates in Chapter 3). A confederate may be useful to create a particular social situation. For example, Hermans, Herman, Larsen, and Engels (2010) studied whether food intake by males is affected by the amount of food consumed by a companion. Participants were recruited for a study on evaluation of movie trailers. The participant and a confederate sat in a comfortable setting in which they viewed and evaluated three trailers. They were then told that they needed a break before viewing the next trailers; snacks were available if they were interested. In one condition, the confederate took a large serving of snacks. A small serving was taken in another condition, and the confederate Page 184did not eat in the third condition. The researchers then measured the amount consumed by the actual participants; they did model the amount consumed by the confederate but only when they were hungry.
Example of the Asch line judgment task
The classic Asch (1956) conformity experiment provides another example of how confederates may be used. Asch gathered people into groups and asked them to respond to a line judgment task such as the one in Figure 9.1. Which of the three test lines matches the standard? Although this appears to be a simple task, Asch made it more interesting by having several confederates announce the same incorrect judgment prior to asking the actual participant; this procedure was repeated over a number of trials with different line judgments. Asch was able to demonstrate how easy it is to produce conformity—participants conformed to the unanimous majority on many of the trials even though the correct answer was clear. Finally, confederates may be used in field experiments as well as laboratory research. As described in Chapter 4, Lee, Schwarz, Taubman, and Hou (2010) studied the impact of public sneezing on the perception of unrelated risks by having an accomplice either sneeze or not sneeze (control condition) while walking by someone in a public area of a university. A researcher then approached those people with a request to complete a questionnaire, which they described as a “class project.” The questionnaire measured participants’ perceptions of average Americans’ risk of contracting a serious disease. The researchers found that, indeed, being around a person who sneezes increases self-reported perception of risk.
As you can see, staged manipulations demand a great deal of ingenuity and even some acting ability. They are used to involve the participants in an ongoing social situation that the individuals perceive not as an experiment but as a real experience. Researchers assume that the result will be natural behavior that truly reflects the feelings and intentions of the participants. However, such procedures allow for a great deal of subtle interpersonal communication that is hard to put into words; this may make it difficult for other researchers to replicate the experiment. Also, a complex manipulation is difficult to interpret. If many things happened during the experiment, what one thing was responsible for the results? In general, it is easier to interpret results when the manipulation is relatively straightforward. However, the nature of the variable you are studying sometimes demands complicated procedures.
Strength of the Manipulation
The simplest experimental design has two levels of the independent variable. In planning the experiment, the researcher has to choose these levels. A general principle to follow is to make the manipulation as strong as possible. A strong manipulation maximizes the differences between the two groups and increases the chances that the independent variable will have a statistically significant effect on the dependent variable.
To illustrate, suppose you think that there is a positive linear relationship between attitude similarity and liking (“birds of a feather flock together”). In conducting the experiment, you could arrange for participants to encounter another person, a confederate. In one group, the confederate and the participant would share similar attitudes; in the other group, the confederate and the participant would be dissimilar. Similarity, then, is the independent variable, and liking is the dependent variable. Now you have to decide on the amount of similarity. Figure 9.2 shows the hypothesized relationship between attitude similarity and liking at 10 different levels of similarity. Level 1 represents the least amount of similarity with no common attitudes, and level 10 the greatest (all attitudes are similar). To achieve the strongest manipulation, the participants in one group would encounter a confederate of level 1 similarity; those in the other group would encounter a confederate of level 10 similarity. This would result in the greatest difference in the liking means—a 9-point difference. A weaker manipulation—using levels 4 and 7, for example—would result in a smaller mean difference.
A strong manipulation is particularly important in the early stages of research, when the researcher is most interested in demonstrating that a relationship does, in fact, exist. If the early experiments reveal a relationship between the variables, subsequent research can systematically manipulate the other levels of the independent variable to provide a more detailed picture of the relationship.
Relationship between attitude similarity and liking
Page 186The principle of using the strongest manipulation possible should be tempered by at least two considerations. The first concerns the external validity of a study: The strongest possible manipulation may entail a situation that rarely, if ever, occurs in the real world. For example, an extremely strong crowding manipulation might involve placing so many people in a room that no one could move—a manipulation that might significantly affect a variety of behaviors. However, we would not know if the results were similar to those occurring in more common, less crowded situations, such as many classrooms or offices.
A second consideration is ethics: A manipulation should be as strong as possible within the bounds of ethics. A strong manipulation of fear or anxiety, for example, might not be possible because of the potential physical and psychological harm to participants.
Cost of the Manipulation
Cost is another factor in the decision about how to manipulate the independent variable. Researchers who have limited monetary resources may not be able to afford expensive equipment, salaries for confederates, or payments to participants in long-term experiments. Also, a manipulation in which participants must be run individually requires more of the researcher’s time than a manipulation that allows running many individuals in a single setting. In this respect, a manipulation that uses straightforward presentation of written or verbal material is less costly than a complex, staged experimental manipulation. Some government and private agencies offer grants for research; because much research is costly, continued public support of these agencies is very important.
MEASURING THE DEPENDENT VARIABLE
In previous chapters, we have discussed various aspects of measuring variables, including reliability, validity, and reactivity of measures; observational methods; and the development of self-report measures for questionnaires and interviews. In this section, we will focus on measurement considerations that are particularly relevant to experimental research.
Types of Measures
The dependent variable in most experiments is one of three general types: self-report, behavioral, or physiological.
Self-report measures Self-reports can be used to measure attitudes, liking for someone, judgments about someone’s personality characteristics, intended behaviors, emotional states, attributions about why someone performed well or poorly on a task, confidence in one’s judgments, and many other aspects of human thought and behavior. Rating scales with descriptive anchors Page 187(endpoints) are most commonly used. For example, Funk and Todorov (2013) studied the impact of a facial tattoo on impressions of a man accused of assault. The man, Jack, had punched another man in a bar following a dispute over a spilled drink. A description of the incident included a photo of Jack with or without a facial tattoo. After viewing the photo and reading the description, subjects responded to several questions on a 7-point scale that included the following:
How likely is it that Jack is guilty?
Behavioral measures Behavioral measures are direct observations of behaviors. As with self-reports, measurements of an almost endless number of behaviors are possible. Sometimes, the researcher may record whether a given behavior occurs—for example, whether an individual responds to a request for help, makes an error on a test, or chooses to engage in one activity rather than another. Often, the researcher must decide whether to record the number of times a behavior occurs in a given time period—the rate of a behavior; how quickly a response occurs after a stimulus—a reaction time; or how long a behavior lasts—a measure of duration. The decision about which aspect of behavior to measure depends on which is most theoretically relevant for the study of a particular problem or which measure logically follows from the independent variable manipulation.
As an example, consider a study on eating behavior while viewing a food-related or nature television program (Bodenlos & Wormuth, 2013). Participants had access to chocolate-covered candies, cheese curls, and carrots that were weighed before and after the session. More candy was consumed during the food-related program; there were no differences for the other two foods.
Sometimes the behavioral measure is not an actual behavior but a behavioral intention or choice. Recall the study described in Chapter 3 in which subjects decided how much hot sauce another subject would have to consume later in the study (Vasquez, Pederson, Bushman, Kelley, Demeestere, & Miller, 2013). They did not actually pour the hot sauce but they did commit to an action rather than simply indicate their feelings about the other subject.
Physiological measures Physiological measures are recordings of responses of the body. Many such measurements are available; examples include the galvanic skin response (GSR), electromyogram (EMG), and electroencephalogram (EEG). The GSR is a measure of general emotional arousal and anxiety; it measures the electrical conductance of the skin, which changes when sweating occurs. The EMG measures muscle tension and is frequently used as a measure of tension or stress. The EEG is a measure of electrical activity of brain cells. It can be used to record general brain arousal as a response to different situations, such as activity in certain parts of the brain as learning occurs or brain activity during different stages of sleep.
Page 188The GSR, EMG, and EEG have long been used as physiological indicators of important psychological variables. Many other physiological measures are available, including temperature, heart rate, and analysis of blood or urine (see Cacioppo & Tassinary, 1990). In recent years, magnetic resonance imaging (MRI) has become an increasingly important tool for researchers in behavioral neuroscience. An MRI provides an image of an individual’s brain structure. It allows scientists to compare the brain structure of individuals with a particular condition (e.g., a cognitive impairment, schizophrenia, or attention deficit hyperactivity disorder) with the brain structure of people without the condition. In addition, a functional MRI (fMRI) allows researchers to scan areas of the brain while a research participant performs a physical or cognitive task. The data provide evidence for what brain processes are involved in these tasks. For example, a researcher can see which areas of the brain are most active when performing different memory tasks. In one study using fMRI, elderly adults with higher levels of education not only performed better on memory tasks than their less educated peers, but they also used areas of their frontal cortex that were not used by other elderly and younger individuals (Springer, McIntosh, Winocur, & Grady, 2005).
Although it is convenient to describe single dependent variables, most studies include more than one dependent measure. One reason to use multiple measures stems from the fact that a variable can be measured in a variety of concrete ways (recall the discussion of operational definitions in Chapter 4). In a study on the effects of an employee wellness program on health, the researchers might measure self-reported fatigue, stress, physical activity, and eating habits along with physical measures of blood pressure, blood sugar, cholesterol, and weight (cf. Clark et al, 2011). If the independent variable has the same effect on several measures of the same dependent variable, our confidence in the results is increased. It is also useful to know whether the same independent variable affects some measures but not others. For example, an independent variable designed to affect liking might have an effect on some measures of liking (e.g., desirability as a person to work with) but not others (e.g., desirability as a dating partner). Researchers may also be interested in studying the effects of an independent variable on several different behaviors. For example, an experiment on the effects of a new classroom management technique might examine academic performance, interaction rates among classmates, and teacher satisfaction.
When you have more than one dependent measure, the question of order arises. Does it matter which measures are made first? Is it possible that the results for a particular measure will be different if the measure comes earlier rather than later? The issue is similar to the order effects that were discussed in Chapter 8 in the context of repeated measures designs. Perhaps responding to the first measures will somehow affect responses on the later measures, Page 189or perhaps the participants attend more closely to first measures than to later measures. There are two possible ways of responding to this issue. If it appears that the problem is serious, the order of presenting the measures can be counterbalanced using the techniques described in Chapter 8. Often there are no indications from previous research that order is a serious problem. In this case, the prudent response is to present the most important measures first and the less important ones later. With this approach, order will not be a problem in interpreting the results on the most important dependent variables. Even though order may be a potential problem for some of the measures, the overall impact on the study is minimized.
Making multiple measurements in a single experiment is valuable when it is feasible to do so. However, it may be necessary to conduct a separate series of experiments to explore the effects of an independent variable on various behaviors.
Sensitivity of the Dependent Variable
The dependent variable should be sensitive enough to detect differences between groups. A measure of liking that asks, “Do you like this person?” with only a simple “yes” or “no” response alternative is less sensitive than one that asks, “How much do you like this person?” on a 5- or 7-point scale. With the first measure, people may tend to be nice and say yes even if they have some negative feelings about the person. The second measure allows for a gradation of liking; such a scale would make it easier to detect differences in amount of liking.
The issue of sensitivity is particularly important when measuring human performance. Memory can be measured using recall, recognition, or reaction time; cognitive task performance might be measured by examining speed or number of errors during a proofreading task; physical performance can be measured through various motor tasks. Such tasks vary in their difficulty. Sometimes a task is so easy that everyone does well regardless of the conditions that are manipulated by the independent variable. This results in what is called a ceiling effect—the independent variable appears to have no effect on the dependent measure only because participants quickly reach the maximum performance level. The opposite problem occurs when a task is so difficult that hardly anyone can perform well; this is called a floor effect.
The need to consider sensitivity of measures is nicely illustrated in the Freedman et al. (1971) study of crowding mentioned in Chapter 4. The study examined the effect of crowding on various measures of cognitive task performance and found that crowding did not impair performance. You could conclude that crowding has no effect on performance; however, it is also possible that the measures were either too easy or too difficult to detect an effect of crowding. In fact, subsequent research showed that the tasks may have been too easy; when subjects perform complex cognitive tasks in laboratory or natural settings, crowding does result in lower performance (Bruins & Barber, 2000; Paulus, Annis, Seta, Schkade, & Matthews, 1976).
Cost of Measures
Another consideration is cost—some measures may be more costly than others. Paper-and-pencil self-report measures are generally inexpensive; measures that require trained observers or elaborate equipment can become quite costly. A researcher studying nonverbal behavior, for example, might have to use a video camera to record each participant’s behaviors in a situation. Two or more observers would then have to view the tapes to code behaviors such as eye contact, smiling, or self-touching (two observers are needed to ensure that the observations are reliable). Thus, there would be expenses for both equipment and personnel. Physiological recording devices are also expensive. Researchers need resources from the university or outside agencies to carry out such research.
The basic experimental design has two groups: in the simplest case, an experimental group that receives the treatment and a control group that does not. Use of a control group makes it possible to eliminate a variety of alternative explanations for the results, thus improving internal validity. Sometimes additional control procedures may be necessary to address other types of alternative explanations. Two general control issues concern expectancies on the part of both the participants in the experiment and the experimenters.
Controlling for Participant Expectations
Demand characteristics We noted previously that experimenters generally do not wish to inform participants about the specific hypotheses being studied or the exact purpose of the research. The reason for this lies in the problem of demand characteristics (Orne, 1962), which is any feature of an experiment that might inform participants of the purpose of the study. The concern is that when participants form expectations about the hypothesis of the study, they will then do whatever is necessary to confirm the hypothesis. For example, if you were studying the relationship between political orientation and homophobia, participants might figure out the hypothesis and behave according to what they think you want, rather than according to their true selves.
One way to control for demand characteristics is to use deception—to make participants think that the experiment is studying one thing when actually it is studying something else. The experimenter may devise elaborate cover stories to explain the purpose of the study and to disguise what is really being studied. The researcher may also attempt to disguise the dependent variable by using an unobtrusive measure or by placing the measure among a set of unrelated filler items on a questionnaire. Another approach is simply to assess whether demand characteristics are a problem by asking participants about their perceptions of the purpose of the research. It may be that participants do Page 191not have an accurate view of the purpose of the study; or if some individuals do guess the hypotheses of the study, their data may be analyzed separately.
Demand characteristics may be eliminated when people are not aware that an experiment is taking place or that their behavior is being observed. Thus, experiments conducted in field settings and observational research in which the observer is concealed or unobtrusive measures are used minimize the problem of demand characteristics.
Placebo groups A special kind of participant expectation arises in research on the effects of drugs. Consider an experiment that is investigating whether a drug such as Prozac reduces depression. One group of people diagnosed as depressive receives the drug and the other group receives nothing. Now suppose that the drug group shows an improvement. We do not know whether the improvement was caused by the properties of the drug or by the participants’ expectations about the effect of the drug—what is called a placebo effect. In other words, just administering a pill or an injection may be sufficient to cause an observed improvement in behavior. To control for this possibility, a placebo group can be added. Participants in the placebo group receive a pill or injection containing an inert, harmless substance; they do not receive the drug given to members of the experimental group. If the improvement results from the active properties of the drug, the participants in the experimental group should show greater improvement than those in the placebo group. If the placebo group improves as much as the experimental group, all improvement could be caused by a placebo effect.
Sometimes, participants’ expectations are the primary focus of an investigation. For example, Marlatt and Rohsenow (1980) conducted research to determine which behavioral effects of alcohol are due to alcohol itself as opposed to the psychological impact of believing one is drinking alcohol. The experimental design to examine these effects had four groups: (1) expect no alcohol–receive no alcohol, (2) expect no alcohol–receive alcohol, (3) expect alcohol–receive no alcohol, and (4) expect alcohol–receive alcohol. This design is called a balanced placebo design. Marlatt and Rohsenow’s research suggests that the belief that one has consumed alcohol is a more important determinant of behavior than the alcohol itself. That is, people who believed they had consumed alcohol (Groups 3 and 4) behaved very similarly, although those in Group 3 were not actually given any alcohol.
In some areas of research, the use of placebo control groups has ethical implications. Suppose you are studying a treatment that does have a positive effect on people (for example, by reducing migraine headaches or alleviating symptoms of depression). It is important to use careful experimental procedures to make sure that the treatment does have an impact and that alternative explanations for the effect, including a placebo effect, are eliminated. However, it is also important to help those people who are in the control conditions; this aligns with the concept of beneficence that was covered in Chapter 3. Thus, participants in the control conditions must be given the treatment as soon as Page 192they have completed their part in the study in order to maximize the benefits of participation.
Placebo effects are real and must receive serious study in many areas of research. A great deal of current research and debate focuses on the extent to which any beneficial effects of antidepressant medications such as Prozac are due to placebo effects (e.g., Kirsch, 2010; Wampold, Minami, Tierney, Baskin, & Bhati, 2005).
Controlling for Experimenter Expectations
Experimenters are usually aware of the purpose of the study and thus may develop expectations about how participants should respond. These expectations can in turn bias the results. This general problem is called experimenter bias or expectancy effects (Rosenthal, 1966, 1967, 1969).
Expectancy effects may occur whenever the experimenter knows which condition the participants are in. There are two potential sources of experimenter bias. First, the experimenter might unintentionally treat participants differently in the various conditions of the study. For example, certain words might be emphasized when reading instructions to one group but not the other, or the experimenter might smile more when interacting with people in one of the conditions. The second source of bias can occur when experimenters record the behaviors of the participants; there may be subtle differences in the way the experimenter interprets and records the behaviors.
Research on expectancy effects Expectancy effects have been studied in a variety of ways. Perhaps the earliest demonstration of the problem is the case of Clever Hans, a horse with alleged mathematical and other abilities that attracted the attention of Europeans in the early 20th century (Rosenthal, 1967). The owner of the horse posed questions to Hans who in turn would provide answers by tapping his hoof (e.g., a question of “what is two times five” would be followed by ten taps). Pfungst (1911) later showed that Hans was actually responding to barely detectable cues provided by the person asking the question. The person would look at the hoof as Hans started to tap and then changed to look at Hans as the correct answer was about to be given. Hans was responding to these head and eye movements that went undetected by observers.
If a clever horse can respond to subtle cues, it is reasonable to suppose that clever humans can too. In fact, research has shown that experimenter expectancies can be communicated to humans by both verbal and nonverbal means (Duncan, Rosenberg, & Finklestein, 1969; Jones & Cooper, 1971). An example of more systematic research on expectancy effects is a study by Rosenthal (1966). In this experiment, graduate students trained rats that were described as coming from either “maze bright” or “maze dull” genetic strains. The animals actually came from the same strain and had been randomly assigned to the bright and dull categories; however, the “bright” rats did perform better than the “dull” rats. Subtle differences in the ways the students treated the rats Page 193or recorded their behavior must have caused this result. A generalization of this particular finding is called “teacher expectancy.” Research has shown that telling a teacher that a pupil will bloom intellectually over the next year results in an increase in the pupil’s IQ score (Rosenthal & Jacobson, 1968). In short, teachers’ expectations can influence students’ performance.
The problem of expectations influencing ratings of behavior is nicely illustrated in an experiment by Langer and Abelson (1974). Clinical psychologists were shown a videotape of an interview in which the person interviewed was described as either an applicant for a job or a patient; in reality, all saw the same tape. The psychologists later rated the person as more “disturbed” when they thought the person was a patient than when the person was described as a job applicant.
Solutions to the expectancy problem Clearly, experimenter expectations can influence the outcomes of research investigations. How can this problem be solved? Fortunately, there are a number of ways to minimize expectancy effects. First, experimenters should be well trained and should practice behaving consistently with all participants. The benefit of training was illustrated in the Langer and Abelson study with clinical psychologists. The bias of rating the “patient” as disturbed was much less among behavior-oriented therapists than among traditional ones. Presumably, the training of the behavior-oriented therapists led them to focus more on the actual behavior of the person, so they were less influenced by expectations stemming from the label of “patient.”
Another solution is to run all conditions simultaneously so that the experimenter’s behavior is the same for all participants. This solution is feasible only under certain circumstances, however, such as when the study can be carried out with the use of printed materials or the experimenter’s instructions to participants are the same for everyone.
Expectancy effects are also minimized when the procedures are automated. As noted previously, it may be possible to manipulate independent variables and record responses using computers; with automated procedures, the experimenter’s expectations are less likely to influence the results.
A final solution is to use experimenters who are unaware of the hypothesis being investigated. In these cases, the person conducting the study or making observations is blind regarding what is being studied or which condition the participant is in. This procedure originated in drug research using placebo groups. In a single-blind experiment, the participant is unaware of whether a placebo or the actual drug is being administered; in a double-blind experiment, neither the participant nor the experimenter knows whether the placebo or actual treatment is being given. To use a procedure in which the experimenter or observer is unaware of either the hypothesis or the group the participant is in, you must hire other people to conduct the experiment and make observations.
Because researchers are aware of the problem of expectancy effects, solutions such as the ones just described are usually incorporated into the procedures of the study. The procedures used in scientific research must be precisely Page 194defined so they can be replicated by others. This allows other researchers to build on previous research. If a study does have a potential problem of expectancy effects, researchers are bound to notice and will attempt to replicate the experiment with procedures that control for them. It is also a self-correcting mechanism that ensures that methodological flaws will be discovered. The importance of replication will be discussed further in Chapter 14.
So far, we have discussed several of the factors that a researcher considers when planning a study. Actually conducting the study and analyzing the results is a time-consuming process. Before beginning the research, the investigator wants to be as sure as possible that everything will be done right. And once the study has been designed, there are some additional procedures that will improve it.
After putting considerable thought into planning the study, the researcher writes a research proposal. The proposal will include a literature review that provides a background for the study. The intent is to clearly explain why the research is being done—what questions the research is designed to answer. The details of the procedures that will be used to test the idea are then given. The plans for analysis of the data are also provided. A research proposal is very similar to the introduction and method sections of a journal article. Such proposals must be included in applications for research grants; ethics review committees require some type of proposal as well (see Chapter 3 for more information on Institutional Review Boards).
Preparing a proposal is a good idea in planning any research project because simply putting your thoughts on paper helps organize and systematize ideas. In addition, you can show the proposal to friends, colleagues, professors, and other interested parties who can provide useful feedback about the adequacy of your procedures. They may see problems that you did not recognize, or they may offer ways of improving the study.
When the researcher has finally decided on all the specific aspects of the procedure, it is possible to conduct a pilot study in which the researcher does a trial run with a small number of participants. The pilot study will reveal whether participants understand the instructions, whether the total experimental setting seems plausible, whether any confusing questions are being asked, and so on.
Sometimes participants in the pilot study are questioned in detail about the experience following the experiment. Another method is to use the think aloud protocol (described in Chapter 7) in which the participants in the pilot study Page 195are instructed to verbalize their thoughts about everything that is happening during the study. Such procedures provide the researcher with an opportunity to make any necessary changes in the procedure before doing the entire study. Also, a pilot study allows the experimenters who are collecting the data to become comfortable with their roles and to standardize their procedures.
A manipulation check is an attempt to directly measure whether the independent variable manipulation has the intended effect on the participants. Manipulation checks provide evidence for the construct validity of the manipulation (construct validity was discussed in Chapter 4). If you are manipulating anxiety, for example, a manipulation check will tell you whether participants in the high-anxiety group really were more anxious than those in the low-anxiety condition. The manipulation check might involve a self-report of anxiety, a behavioral measure (such as number of arm and hand movements), or a physiological measure. All manipulation checks, then, ask whether the independent variable manipulation was in fact a successful operationalization of the conceptual variable being studied. Consider, for example, a manipulation of physical attractiveness as an independent variable. In an experiment, participants respond to someone who is supposed to be perceived as attractive or unattractive. The manipulation check in this case would determine whether participants do rate the highly attractive person as more physically attractive.
Manipulation checks are particularly useful in the pilot study to decide whether the independent variable manipulation is in fact having the intended effect. If the independent variable is not effective, the procedures can be changed. However, it is also important to conduct a manipulation check in the actual experiment. Because a manipulation check in the actual experiment might distract participants or inform them about the purpose of the experiment, it is usually wise to position the administration of the manipulation check measure near the end of the experiment; in most cases, this would be after measuring the dependent variables and prior to the debriefing session.
A manipulation check has two advantages. First, if the check shows that your manipulation was not effective, you have saved the expense of running the actual experiment. You can turn your attention to changing the manipulation to make it more effective. For instance, if the manipulation check shows that neither the low- nor the high-anxiety group was very anxious, you could change your procedures to increase the anxiety in the high-anxiety condition.
Second, a manipulation check is advantageous if you get nonsignificant results—that is, if the results indicate that no relationship exists between the independent and dependent variables. A manipulation check can identify whether the nonsignificant results are due to a problem in manipulating the independent variable. If your manipulation is not successful, it is only reasonable that you will obtain nonsignificant results. If both groups are equally anxious after you manipulate anxiety, anxiety cannot have any effect on the dependent measure. Page 196What if the check shows that the manipulation was successful, but you still get nonsignificant results? Then you know at least that the results were not due to a problem with the manipulation; the reason for not finding a relationship lies elsewhere. Perhaps you had a poor dependent measure, or perhaps there really is no relationship between the variables.
The importance of debriefing was discussed in Chapter 3 in the context of ethical considerations. After all the data are collected, a debriefing session is usually held. This is an opportunity for the researcher to interact with the participants to discuss the ethical and educational implications of the study.
The debriefing session can also provide an opportunity to learn more about what participants were thinking during the experiment. Researchers can ask participants what they believed to be the purpose of the experiment, how they interpreted the independent variable manipulation, and what they were thinking when they responded to the dependent measures. Such information can prove useful in interpreting the results and planning future studies.
Finally, researchers may ask the participants to refrain from discussing the study with others. Such requests are typically made when more people will be participating and they may talk with one another in classes or residence halls. People who have already participated are aware of the general purposes and procedures; it is important that these individuals not provide expectancies about the study to potential future participants.
ANALYZING AND INTERPRETING RESULTS
After the data have been collected, the next step is to analyze them. Statistical analyses of the data are carried out to allow the researcher to examine and interpret the pattern of results obtained in the study. The statistical analysis helps the researcher decide whether there really is a relationship between the independent and dependent variables; the logic underlying the use of statistical tests is discussed in Chapter 13. It is not the purpose of this book to teach statistical methods; however, the calculations involved in several statistical tests are provided in Appendix C.
COMMUNICATING RESEARCH TO OTHERS
The final step is to write a report that details why you conducted the research, how you obtained the participants, what procedures you used, and what you found. A description of how to write such reports is included in Appendix A. After you have written the report, what do you do with it? How do you communicate the findings to others? Research findings are most often submitted as journal articles or as papers to be read at scientific meetings. In either case, Page 197the submitted paper is evaluated by two or more knowledgeable reviewers who decide whether the paper is acceptable for publication or presentation at the meeting.
Meetings sponsored by professional associations are important opportunities for researchers to present their findings to other researchers and the public. National and regional professional associations such as the American Psychological Association (APA) and the Association for Psychological Science (APS) hold annual meetings at which psychologists and students present their own research and learn about the latest research being done by their colleagues. Sometimes, verbal presentations are delivered to an audience. However, poster sessions are more common; here, researchers display posters that summarize the research and are available to discuss the research with others.
As we noted in Chapter 2, many journals publish research papers. Nevertheless, the number of journals is small compared to the number of reports written; thus, it is not easy to publish research. When a researcher submits a paper to a journal, two or more reviewers read the paper and recommend acceptance (often with the stipulation that revisions be made) or rejection. This process is called peer review and it is very important in making sure that research has careful external review before it is published. As many as 90% of papers submitted to the more prestigious journals are rejected. Many rejected papers are submitted to other journals and eventually accepted for publication, but much research is never published. This is not necessarily bad; it simply means that selection processes separate high-quality research from that of lesser quality.
Many of the decisions that must be made when planning an experiment were described in this chapter. The discussion focused on experiments that use the simplest experimental design with a single independent variable. In the next chapter, more complex experimental designs are described.
ILLUSTRATIVE ARTICLE: CONDUCTING EXPERIMENTS
Many people behave superstitiously. That is, they may believe that their lucky shirt helps them with an exam, or that washing a uniform after a game removes the “luck,” or that winning the lottery is dependent on playing one’s lucky numbers. Many of us believe that, indeed, these superstitions do not really affect outcomes. Superstition has been studied in psychology for some time. B. F. Skinner (1947) demonstrated that superstitious Page 198behavior could be seen in a pigeon! More recently, Damisch, Stoberock, and Mussweiler (2010) decided to see if they could observe any effect that superstitious behaviors had on several different performance measures, including putting in golf, motor dexterity, memory, and performance on a word jumble puzzle.
Over four different experiments, the researchers varied participants’ perceptions of “luck” and then measured performance. In the first experiment, university students were randomly assigned to conditions wherein they were asked to putt using either a “lucky ball” (condition 1) or a “neutral ball” (condition 2). Participants in the “lucky ball” condition were statistically better putters than those in the “neutral ball” condition.
First, acquire and read the article:
Damisch, L., Stoberock, B., & Mussweiler, T. (2010). Keep your fingers crossed! How superstition improves performance. Psychological Science, 21, 1014–1020. doi:10.1177/0956797610372631
Then, after reading the article, consider the following:
1. For each of the four experiments, describe how the manipulation of the independent variable was straightforward or staged.
2. In this chapter, we discuss three types of dependent measures: self-report, behavioral, and physiological. In the experiments presented in this paper, what types of dependent measures were used? Could other types of dependent measures have been used? How so?
3. Was the dependent measure used in Experiment 1 sensitive? How so?
4. Did these researchers use any manipulation checks in their experiments? Design a manipulation check for Experiment 2.
5. This paper includes four experiments. Given that these researchers were interested in superstition, why was using multiple studies a good thing for the internal validity of the study?
6. How good was the internal validity of this series of studies?
7. How would you extend the study?
Behavioral measure (p. 187)
Ceiling effect (p. 189)
Confederate (p. 183)
Demand characteristics (p. 190)
Double-blind experiment (p. 193)
Electroencephalogram (p. 187)
Electromyogram (p. 187)
Expectancy effects (experimenter bias) (p. 192)
Filler items (p. 190)
Page 199Floor effect (p. 189)
Functional MRI (p. 188)
Galvanic skin response (p. 187)
Manipulation check (p. 195)
Manipulation strength (p. 185)
MRI (p. 188)
Physiological measure (p. 187)
Pilot study (p. 194)
Placebo group (p. 191)
Self-report (p. 186)
Sensitivity (p. 189)
Single-blind experiment (p. 193)
Staged manipulation (p. 183)
Straightforward manipulation (p. 181