![]() |
||||||||||||||||||||
|
Michael H. Birnbaum California State university, Fullerton Student evaluations of teaching were originally intended as devices to help measure the quality of instruction, for the purpose of improving the quality of education, but they may be causing more harm than good. Because procedures used in research on student evaluations are confounded, research has not been able to demonstrate that student evaluations are valid measures of teaching quality. However, the real effect of student evaluations depends not on their actual validity but rather on their perceived properties. Because retention, tenure, promotion, and PSSI salary raises are influenced by student evaluations, faculty members will make changes in their courses that they believe will improve their evaluations. This article explores causal theories held by members of the faculty concerning student evaluations; in particular, it explores instructors' theories of how changes in their grading standards and content of courses would affect student evaluations and student learning. An email request to complete a survey was sent to all members of the faculty of C.S.U.F. The survey was completed (via Internet) by 208 faculty members. The majority judged that student learning can be improved by increasing course content and by raising standards for grading. However, they also stated that these improvements would hurt their evaluations. The majority judged that the current system of tenure and promotion discourages raising standards, encourages lowering of standards, and promotes "watering down" of course content. The data appear consistent with the hypothesis that most faculty believe that ratings will be lowered by changes that would improve learning, and that the effect of student evaluations of teaching is to decrease the quality of education. Faculty theories are compared to students' judgments of their own policies. The majority of students state that they would give highest ratings to courses with the least content and the lowest standards. Apparently, the majority of faculty members understand the majority of student opinion. Debate on Validity of Student Evaluations A recent issue of American Psychologist featured the controversy on the validity of student evaluations of teaching (d'Apollonia & Abrami, 1997; Greenwald, 1997; Greenwald & Gillmore, 1997; Marsh & Roche, 1997; McKeachie, 1997). These authors noted that there are many possible interpretations of student ratings because procedures used in research on this topic are confounded. Because different instructors teach different content, use different standards, wear different clothes, give different exams and tell different jokes to different groups of students who are not randomly assigned to classes, it is not possible to confirm or refute opposing theories of student evaluations. D'Apollonia and Abrami (1997) performed a meta-analysis of studies that used different sections of the same class in which evaluations were correlated with measures of student learning. They concluded that only 10% of the variance of student evaluations is related to valid measures of learning, and the data permit rejection of the statistical hypothesis that as much as one-sixth of the variance of student evaluations is validly related to educational performance. Some consider student evaluations to be so complicated that anyone who might dare to use them for practical purposes must be familiar with nonlinear, nonadditive, multidimensional modeling of confounded judgment data (Marsh & Roche, 1997; McKeachie, 1997). My own research specialty is the study of nonadditive , nonlinear models of human judgment, so I suppose these authors would consider me qualified to interpret student evaluation data. In judgment research that uses similar techniques as those used in student evaluations, one can show that the number 450 is judged to be a significantly "bigger" number than 550 (Birnbaum, 1974; Parducci, 1968; Parducci, 1995). Using procedures even more closely related to those of student evaluations than I used in 1974, I recently showed that the number 9 is judged to be a significantly "bigger" number than 221 (Birnbaum, 1999). Since most of us know that 9 is actually smaller than 221, we should be wary of conclusions when procedures that lead to bizarre conclusions are applied to important, practical questions. If 9 were really larger than 221, we might feel more confident in the conclusion that we should promote teacher A and fire teacher B because A gets higher ratings than B. If we doubt that 9 is really "bigger" than 221, we should be wary of the conclusion that A is a better teacher than B simply because A gets higher ratings.Continued |
||||||||||||||||||||
![]() |
||||||||||||||||||||
![]() |
||||||||||||||||||||
![]() |
||||||||||||||||||||
![]() |
||||||||||||||||||||
|
||||||||||||||||||||
|
|
||||||||||||||||||||
|
||||||||||||||||||||
|
||||||||||||||||||||