Reading about validity and reliability issues this week has led me to reflect on the one high-stakes assessment in which I regularly participated for four years: an end-of-semester portfolio assessment for FYC students. I’ll briefly provide a description of the assessment so that my reflections can be situated within that specific context.
At my previous institution, students were required to complete one semester of FYC or its equivalent, which they could do by making a 3 on the AP exam, transferring credits from elsewhere (though credits did not often transfer), or completing a “challenge exam” to demonstrate that they already met the ENG 101 course outcomes. Each semester we had around 30 sections of ENG 101, mostly taught by graduate teaching assistants and part-time faculty. Two TT faculty members also regularly taught the course. Three programmatic documents helped to unify the sections of 101: the challenge exam prompt, the portfolio assessment rubric (PAR), and the appeal prompt. Most important for my discussion is the PAR, which was designed for use in fall of 2009 (the year I began teaching). The PAR outlined the traits of academic writing that students were expected to demonstrate, and it asked readers to make a holistic pass/fail decision based on these categories. We were encouraged not to use the traits as a check list, but rather to consider these aspects of academic writing and to make a holistic decision about a student’s demonstration of these traits. To demonstrate their abilities, students submitted a portfolio containing two academic essays and reflective introduction.* These portfolios were read by two teachers who were not the instructor of record for the student, and in the event of a disagreement between the two readers, a third reader who was not aware of the previous readers’ decisions assessed the work.
We participated in calibration sessions during the semester and on the morning of each day of the portfolio reading, which lasted three days. A point of contention that repeatedly arose between two of the officials responsible for designing the assessment had to do with framing the question of whether to pass/fail a portfolio. Professor X asked us to consider whether the student would benefit from another semester of ENG 101. Professor Y insisted that asking readers to make that decision was unfair; readers couldn’t possibly know whether students would benefit from another semester, but they could determine whether the work they were assessing met the course goals. The point of contention between these two professors seems to be related to issues of test use. The portfolio was being used to sort students out of ENG 101 or into another semester of ENG 101, and, drawing from Shepherd’s argument, that seems to be an important point for readers to consider. Professor Y was putting forward the position described by Shepherd: “They worry that addressing consequences will overburden the concept of validity or overwork test makers” (5). Professor Y’s concern was that asking the readers to consider their decision in terms of the enormous impact on students would prevent them from making sound decisions. I wonder, though, if thinking in terms of the consequences for students would necessarily prevent us from making accurate and appropriate decisions.
For instance, one semester I had a student who had already failed the portfolio review and was taking ENG 101 for a second time. Failing the assessment had a negative impact on her self-efficacy and her willingness to engage in reflective writing practices. After submitting a portfolio of her work in my course, she again failed the assessment. Both the student and I appealed the readers’ decisions, but the appeals committee would not overturn the decision. The student was faced with taking a third semester of ENG 101, despite the fact that her reflective introduction clearly demonstrated a growing reflective understanding of writing and her own writing processes. Needless to say, the student’s negative feelings about writing only intensified after this experience.
In this particular instance, would making the student complete a third semester of ENG 101 really be beneficial? Should the readers have been made aware of this specific student’s situation, or would that have been detrimental to the decision-making process, as Professor Y suggested? Given that the assessment purported to measure reflective awareness, did reliability trump validity? Is this also evidence that the assessment suffered from “construct underrepresentation”?
Moments like this one are clearly crucial for opening up reflective spaces for participants in the assessment process. As Moss points out, critically reflecting on test use and design is crucial so that stakeholders’ “perspectives…can be reaffirmed with a new self-consciousness or enabled to evolve” (156). However, the particular situation of this one student was not clearly visible to anyone other than the student and me. Because the assessment process was not designed to “capture” this information, it could not be brought to light except by the student’s willingness to speak to officials responsible for constructing and conducting the assessment. It didn’t occur to me at the time, but now I’m wondering how that particular assessment process could be revised to recognize the types of experiences non-mainstream students had in FYC courses and with the assessment. Or, as Moss asks, “[t]o what extent is the writing program complicit in simply reproducing a narrow model of academic writing (and the understanding of knowledge it entails) without providing opportunity for the values implicit in the model to be illuminated and self-consciously considered?” (157). Considering the assessment through the lens of this student’s experience does seem to indicate that the writing program reproduced a narrow model of academic writing, and one that differed from the construct of writing the department claimed to value. On the one hand, we taught students about the importance and value of reflection for writing processes, and on the other, we based our assessments almost solely on a narrow conception of academic writing as separable from students’ metacognitive awareness of writing.
I’m not sure that I’ve arrived at any sort of conclusion here, but I suppose that I’m left with a question about assessment design. How can we ensure that reflective spaces are opened up within the assessment process? How could (or should) experiences like the one my student had be made visible to those responsible for (re)designing the assessment?
*I completed a study in spring 2011 on readers’ expectations for this reflective component of the portfolio and revised the PAR to include criteria for the “critical reflection.” This revision enabled students to understand how readers were assessing their reflections and provided the basis for classroom discussions of reflective writing. Prior to my work, students did not have access to assessment criteria for the reflection, teachers did not have a unified understanding of what readers wanted, and readers had no concrete criteria by which to assess reflections. As a result, reflections were often undervalued in the assessment despite the emphasis placed on reflection in multiple ENG 101 classrooms.