The project uses a longitudinal research design and will generate evidence that is descriptive [case study, design research, observational] and associative/correlational [validity]. Original data are being collected on middle school Earth science teachers and their students using assessments of learning, observation [personal observation, Web logs], and survey research [self-completion questionnaire, structured interviewer-administered questionnaire, semi-structured or informal interview]. The intervention consists of simulation-based instructional modules with embedded assessment and simulation-based summative assessments.
The project is using a variety of instruments or measures, including:
- Pre/posttest of AAAS items. We will select approximately 30 items from AAAS item bank of multiple-choice items to construct each of the pre/posttest instruments for middle school plate tectonics and climate. We will continue to use the pre/posttest constructed for ecosystems from the AAAS item bank.
- SimScientists assessment data. The LMS collects student responses and actions as they work through tasks in the formative and reports the amount of help needed by students to complete the tasks. These results are reported by content and inquiry target. The LMS collects both the computer-scored and teacher-scored constructed responses on the end-of-unit benchmark assessments. Methods for the computer-based and teacher scoring were developed in Calipers II, including web-based rater training software that calculates inter-rater reliability.
- Computer logs of student interactions with SimScientists. Computer logs will document students’ interaction with the simulation-based learning environment. Logs provide fine-grained data about student responses, actions, and use of coaching.
- Teacher instructional surveys and questionnaires. All pilot teachers will complete online teacher surveys after each use of curriculum-embedded activities. These surveys ask teachers to summarize observations of student use. End-of-unit questionnaires ask for evaluations of the quality, logistics, utility, and effectiveness of the SimScientists instructional modules and professional development. Teachers win the feasibility test will complete interviews only.
- Case studies. WestEd will conduct case studies in one of each of the middle school teacher’s classrooms, using classroom observations, cognitive labs, and teacher interviews. Field notes from observations will document teacher roles, participation and activity structures, as well as any problems related to the science content or technology use. Interviews conducted at the end of each unit will probe in more depth teacher perceptions of quality, utility, and feasibility. In addition to the observations and interviews, cognitive labs conducted with the initial prototypes to examine preliminary construct validity and feasibility, cognitive labs will be conducted with two students in each of the case study classrooms for half of the curriculum modules in a unit. In cognitive labs, students think aloud while responding to the instructional components, feedback and coaching, and the embedded and benchmark assessments. Researchers will follow the think-aloud protocol developed in Calipers II and SimScientists projects. A software program records screen actions and the student’s voice. Brief follow-up interviews ask students about the clarity of the tasks and usability of the technology.
- Implementation. In both Pilot Tests 1 and 2, teachers will participate in professional development. After enrolling their students in the LMS, teachers will administer the pretest, and then teach the relevant instructional unit, inserting the SimScientists supplemental curriculum modules when relevant. During implementation, teachers will monitor the use of the simulations and reflection activities and complete the online surveys. At the end of each unit, teachers will administer the simulation-based benchmark and posttest. In-depth descriptions of the implementations will be provided by the case studies.
Analysis plans for the variety of data being collected include:
- Learning trajectories. This project will explore the use of partial-credit IRT models and Bayes Net analyses to validate the hypothesized learning trajectories for three crosscutting concepts and their classification into levels. First, the alignment between the assessment items and the progression levels and their hypothesized learning trajectories will be validated. Evidence will be gathered from multiple sources: the external expert review, cognitive labs, and from teacher input via the survey and interviews. Once the correspondence between the progression levels and hypothesized learning trajectories and items has been confirmed, the data fit to the trajectories will be evaluated by examining the ordering of item score thresholds and student proficiencies and comparing them to predictions based on the theoretical structures (Wilson, 2009). At the item level, estimated thresholds for item score levels aligned with different model levels will be compared using a Wright map to determine if they conform to the expected order. At the person level, evidence that students master different model levels on the learning trajectory in the expected order will be examined by using both static mastery estimates of individual student item mastery for the benchmark assessments in the form of kid-maps (Wilson, 2005), and dynamic mastery progressions over the embedded and benchmark assessments in the form of progress variables. The analysis plan will evolve as we explore multiple methods and models appropriate for the complexity of our data and objectives.
- Module quality. The appropriateness and quality of the science content and instructional approaches will be assessed by expert review, teacher input, and student cognitive labs. Any issues with alignment or cohesion will be flagged for review. The technical quality of the embedded and benchmark assessments will be evaluated using both classical and IRT analyses to determine overall test reliability, individual item fit, and evidence of construct validity.
- Classroom use. Implementation quality and quantity will be documented though classroom observations and teacher surveys and interviews. These data will be operationalized to produce implementation variables of both quality and quantity for use in the student impact analysis.
- Student impact. Conceptual progress along each learning trajectory will be evaluated by estimating level mastery using a Bayes net classifier for each of the embedded and benchmark assessments. This analysis will provide evidence of whether student gains were consistent with mastery of the progression levels. We will explore two ways of using the Bayes net as part of a validity study. (1) We will compare classifications of student proficiency with other classifications (teacher's ratings, scores from the benchmarks), using Cohen's kappa and Goodman and Kruskal's lambda as the relevant statistics. (2) We will convert the classifications from the Bayes net into EAP scores (Almond et al, 2009) and then examine correlations with other scores for convergent/ divergent validity.