Assessment in science education

Knowing what is assessed reveals the true educational focus!

 

Goals for this handout:

·        To review recent developments in assessment and evaluation.

·        To introduce the terms needed to understand the assessment literature.

·        To review and cite references for landmark research studies and instruments used to answer questions about curriculum and assessment.

·        To model the use of an outline to “chunk” information into a logical organization that can then be used for a written or oral presentation.

·        To demonstrate a “Learning Cycle” approach to instruction (the BSCS version).

·        To provide the background needed to recognize key perspectives on assessment in the state and national standards.

 

1)      Engage- How does the blood circulate in the human body?  Try out the Arnaudin and Mintzes’ instrument.

2)      Explore- BIOL102 assessment results using three distinct measures (Calibrated Peer Review, exam essay, and the Arnaudin and Mintzes’ instrument).

3)      Explain- Assessment then and now

a.       Feedback and progress monitoring

b.      Then:  TESTING, criterion referenced (mastery learning) versus norm referenced (using “a curve”) based on multiple choice items

c.       Now: OUTCOMES and PERFORMANCE (behavioral objectives), alternative assessment, authentic assessment, and performance-based assessment.

                                                               i.      Assessing processes

1.      interviews

2.      observations

3.      logs and journals

4.      self-evaluation

5.      debriefing and demonstrations (reflection on what, why, how, and suggested changes)

6.      behavioral checklists

                                                             ii.      Assessing products

1.      essays with prompts and scoring criteria

2.      projects with rating criteria

3.      portfolios with rating criteria

4.      demonstrations/performances

5.      attitude inventories, surveys

6.      standardized tests with “explanations”

d.      Changes

                                                               i.      From emphasis on products and outcomes to concern for the process (discrepant events, graphic organizers, metacognitive skills)

                                                             ii.      From discrete, isolated topics to integrated and cross-disciplinary

                                                            iii.      Rubrics for developmental levels, definition of areas of understanding (Wiggins & McTighe, 1998).

                                                           iv.      From knowing and skills to an emphasis on application and use of knowledge.

                                                             v.      Contextualization

                                                           vi.      No single, correct answer – instead, convincing arguments, supporting evidence

                                                          vii.      Public standards known in advance

                                                        viii.      Multidimensional – recognition of diverse abilities and talents

                                                           ix.      Assessment of collaborative process skills

                                                             x.      Assessment of collaborative products

e.       Terms

                                                               i.      Assessment versus evaluation

                                                             ii.      Linking assessment and instruction (formative, embedded versus summative)

                                                            iii.      Fairness, freedom from bias, equity

                                                           iv.      Meaningful criteria (rubrics: descriptors for competent, satisfactory, and inadequate)

                                                             v.      Reports (checklists, numerical scales, qualitative descriptors linked with scales versus holistic scores)

                                                           vi.      Feasibility

                                                          vii.      Generalizability

                                                        viii.      Reliability (inter versus intra-rater consistency and agreement over time)

                                                           ix.      Validity: accuracy of test-based conclusions; notice that validity depends upon how the results are used

f.        Issues

                                                               i.      In a study by Rakow (1985), ability accounted for only 6.4 % of the variance in science inquiry skills for non-white students compared with 20.7 % of the variance for white students. Underlying assumptions of standardized test makers

1.      commonality of experience

2.      equal educational opportunities

3.      equal facility with the language

4.      syntax and word usage is familiar regardless of sociocultural, economic, and linguistic differences

                                                             ii.      Assessment of important outcomes

1.      Scientific inquiry

a.       Test of Enquiry Skills (TOES), Fraser, 1980

b.      Low correlation between multiple choice test and hands-on assessment, Baxter, Shavelson, and Pine, 1992

c.       Procedural tests

                                                                                                                                       i.      Physics, Kruglak, 1958

                                                                                                                                     ii.      Chemistry, Ben-Zvi, Hofstein, Samuel, and Kempa, 1977

                                                                                                                                    iii.      Biology, Goldman, 1974

2.      Problem solving debate: does instruction increase general intellectual capacity (Piaget), or are improved skills context dependant (Ausubel, 1968, Novak & Gowin, 1984)?

a.       Tests of formal thinking can still be used to assess scientific reasoning (Shayer & Adey, 1992, Tisher & Dale, 1975, Tobin & Capie, 1981)

b.      Use of lab activities designed to teach higher levels of Bloom’s (1956) cognitive domain shows substantial improvement in high level cognitive performance (Wheatley, 1975)

c.       Infusion of critical thinking develops thinking in the subject, enhances transfer to other domains, and improves subject matter understanding (Zohar, Weinberger, & Tamir, 1994)

3.      Conceptual understanding

a.       Improved multiple choice items using misconceptions as distracters (select BEST, not CORRECT answer), some diagnostic tests published as “Concept Inventories,” Marx, 1999.

b.      Concept mapping (Novak, 1981)

c.       Proposition Generating Tasks (PGT) (Novak & Gowin, 1984)

4.      Nature of Science

a.       Science Process Inventory (SPI) (Welch and Pella, 1968, and Walberg, 1969)

b.      Test of Understanding Science (TOUS), Cooley and Klopfer, 1961, also Tamir, 1975, Jungwirth, 1970, and Troxel & Snider, 1970

c.       Nature of Science (NOS), Alters, JRST, 1997

d.      Views on Science Technology, and Society (VOSTS), Aikenhead and Ryan, 1992

e.       FAS, NOSS, NSKS, TSAS, WISP (Alters, 1997)

5.      Attitudes/ Affective Domain

a.       Curiosity: wonder and exploration

                                                                                                                                       i.      Krathwohl, Bloom, & Masia taxonomy of the affective domain (1964)

                                                                                                                                     ii.      Campbell Curiosity Inventory

1.      Campbell, 1971, 1972, and Tamir, 1974, 1978, and Maccoby & Jacklin, 1974

2.      Q-R, questioning versus recall score of “intellectual curiosity” (Kempa & Dube, 1973, and Tamir, 1981)

b.      Interest: steady behaviors and expressions

                                                                                                                                       i.      Involvement in activities outside the classroom and beyond the prescribed (Klopfer, 1971, and Campbell, 1972, and Keeves, 1973)

                                                                                                                                     ii.      Science Activities Inventory (Cooley and Reed, 1961)

                                                                                                                                    iii.      A Test of Interests (Meyer, 1970 and Hofstein, Ben-Zvi, Samuel, & Kempa, 1977)

                                                                                                                                   iv.      Structure of Interest in Biology (Tamir & Gardner, 1989)

g.       Trends

                                                               i.      Trends in purpose

1.      Raising standards for education

2.      higher-level cognitive measures

3.      diagnostic tests of pre-conceptions/misconceptions

4.      more and comprehensive feedback

5.      problem-solving tasks

6.      multiple measures – diversity of assessment approaches

7.      affective measures

8.      long-term and transfer tests to measure in-depth comprehension

                                                             ii.      Trends in task type

1.      innovative tools, CPR and concept mapping

2.      performance-based tasks

3.      Computer-based items

4.      content expertise required to identify formative assessment and learning opportunities

5.      interesting (engaging) problems

6.      critical thinking in all tests

7.      multiple choice improved with confidence/justification options

8.      more formative assessment

9.      assessing small groups, student interactions

10.  non-test tasks such as portfolios

                                                            iii.      Changes in meaning

1.      More criterion-referenced rather than norm-referenced

2.      More qualitatively-expressed evaluation

4)      Elaborate

a.       Concept map of published views on assessment for assigned standards.

b.      Compare/contrast views on assessment among the standards

5)      Evaluate (Work on assessment) 

a.       CPR on assessment text input is due by March 12

b.      30 of the 100 points for your semester project is due by March 19

c.       Review the assessments -report on the exemplary curriculum, Apr 9, 2001.