Assessment in science education
Knowing what is assessed reveals the true educational
focus!
Goals for this handout:
·
To review recent
developments in assessment and evaluation.
·
To introduce the terms
needed to understand the assessment literature.
·
To review and cite
references for landmark research studies and instruments used to answer
questions about curriculum and assessment.
·
To model the use of an
outline to “chunk” information into a logical organization that can then be
used for a written or oral presentation.
·
To demonstrate a
“Learning Cycle” approach to instruction (the BSCS version).
·
To provide the
background needed to recognize key perspectives on assessment in the state and
national standards.
1) Engage- How does the blood circulate in the human body? Try out the Arnaudin and Mintzes’ instrument.
2) Explore- BIOL102 assessment results using three distinct measures (Calibrated Peer Review, exam essay, and the Arnaudin and Mintzes’ instrument).
3) Explain- Assessment then and now
a. Feedback and progress monitoring
b. Then: TESTING, criterion referenced (mastery learning) versus norm referenced (using “a curve”) based on multiple choice items
c. Now: OUTCOMES and PERFORMANCE (behavioral objectives), alternative assessment, authentic assessment, and performance-based assessment.
i. Assessing processes
1. interviews
2. observations
3. logs and journals
4. self-evaluation
5. debriefing and demonstrations (reflection on what, why, how, and suggested changes)
6. behavioral checklists
ii. Assessing products
1. essays with prompts and scoring criteria
2. projects with rating criteria
3. portfolios with rating criteria
4. demonstrations/performances
5. attitude inventories, surveys
6. standardized tests with “explanations”
d. Changes
i. From emphasis on products and outcomes to concern for the process (discrepant events, graphic organizers, metacognitive skills)
ii. From discrete, isolated topics to integrated and cross-disciplinary
iii. Rubrics for developmental levels, definition of areas of understanding (Wiggins & McTighe, 1998).
iv. From knowing and skills to an emphasis on application and use of knowledge.
v. Contextualization
vi. No single, correct answer – instead, convincing arguments, supporting evidence
vii. Public standards known in advance
viii. Multidimensional – recognition of diverse abilities and talents
ix. Assessment of collaborative process skills
x. Assessment of collaborative products
e. Terms
i. Assessment versus evaluation
ii. Linking assessment and instruction (formative, embedded versus summative)
iii. Fairness, freedom from bias, equity
iv. Meaningful criteria (rubrics: descriptors for competent, satisfactory, and inadequate)
v. Reports (checklists, numerical scales, qualitative descriptors linked with scales versus holistic scores)
vi. Feasibility
vii. Generalizability
viii. Reliability (inter versus intra-rater consistency and agreement over time)
ix. Validity: accuracy of test-based conclusions; notice that validity depends upon how the results are used
f. Issues
i. In a study by Rakow (1985), ability accounted for only 6.4 % of the variance in science inquiry skills for non-white students compared with 20.7 % of the variance for white students. Underlying assumptions of standardized test makers
1. commonality of experience
2. equal educational opportunities
3. equal facility with the language
4. syntax and word usage is familiar regardless of sociocultural, economic, and linguistic differences
ii. Assessment of important outcomes
1. Scientific inquiry
a. Test of Enquiry Skills (TOES), Fraser, 1980
b. Low correlation between multiple choice test and hands-on assessment, Baxter, Shavelson, and Pine, 1992
c. Procedural tests
i. Physics, Kruglak, 1958
ii. Chemistry, Ben-Zvi, Hofstein, Samuel, and Kempa, 1977
iii. Biology, Goldman, 1974
2. Problem solving debate: does instruction increase general intellectual capacity (Piaget), or are improved skills context dependant (Ausubel, 1968, Novak & Gowin, 1984)?
a. Tests of formal thinking can still be used to assess scientific reasoning (Shayer & Adey, 1992, Tisher & Dale, 1975, Tobin & Capie, 1981)
b. Use of lab activities designed to teach higher levels of Bloom’s (1956) cognitive domain shows substantial improvement in high level cognitive performance (Wheatley, 1975)
c. Infusion of critical thinking develops thinking in the subject, enhances transfer to other domains, and improves subject matter understanding (Zohar, Weinberger, & Tamir, 1994)
3. Conceptual understanding
a. Improved multiple choice items using misconceptions as distracters (select BEST, not CORRECT answer), some diagnostic tests published as “Concept Inventories,” Marx, 1999.
b. Concept mapping (Novak, 1981)
c. Proposition Generating Tasks (PGT) (Novak & Gowin, 1984)
4. Nature of Science
a. Science Process Inventory (SPI) (Welch and Pella, 1968, and Walberg, 1969)
b. Test of Understanding Science (TOUS), Cooley and Klopfer, 1961, also Tamir, 1975, Jungwirth, 1970, and Troxel & Snider, 1970
c. Nature of Science (NOS), Alters, JRST, 1997
d. Views on Science Technology, and Society (VOSTS), Aikenhead and Ryan, 1992
e. FAS, NOSS, NSKS, TSAS, WISP (Alters, 1997)
5. Attitudes/ Affective Domain
a. Curiosity: wonder and exploration
i. Krathwohl, Bloom, & Masia taxonomy of the affective domain (1964)
ii. Campbell Curiosity Inventory
1. Campbell, 1971, 1972, and Tamir, 1974, 1978, and Maccoby & Jacklin, 1974
2. Q-R, questioning versus recall score of “intellectual curiosity” (Kempa & Dube, 1973, and Tamir, 1981)
b. Interest: steady behaviors and expressions
i. Involvement in activities outside the classroom and beyond the prescribed (Klopfer, 1971, and Campbell, 1972, and Keeves, 1973)
ii. Science Activities Inventory (Cooley and Reed, 1961)
iii. A Test of Interests (Meyer, 1970 and Hofstein, Ben-Zvi, Samuel, & Kempa, 1977)
iv. Structure of Interest in Biology (Tamir & Gardner, 1989)
g. Trends
i. Trends in purpose
1. Raising standards for education
2. higher-level cognitive measures
3. diagnostic tests of pre-conceptions/misconceptions
4. more and comprehensive feedback
5. problem-solving tasks
6. multiple measures – diversity of assessment approaches
7. affective measures
8. long-term and transfer tests to measure in-depth comprehension
ii. Trends in task type
1. innovative tools, CPR and concept mapping
2. performance-based tasks
3. Computer-based items
4. content expertise required to identify formative assessment and learning opportunities
5. interesting (engaging) problems
6. critical thinking in all tests
7. multiple choice improved with confidence/justification options
8. more formative assessment
9. assessing small groups, student interactions
10. non-test tasks such as portfolios
iii. Changes in meaning
1. More criterion-referenced rather than norm-referenced
2. More qualitatively-expressed evaluation
4)
Elaborate
a. Concept map of published views on assessment for assigned standards.
b. Compare/contrast views on assessment among the standards
5) Evaluate (Work on assessment)
a. CPR on assessment text input is due by March 12
b. 30 of the 100 points for your semester project is due by March 19
c. Review the assessments -report on the exemplary curriculum, Apr 9, 2001.