Much of the work we read on AES pointed to the limitations of using that technology. For instance, Les Perelman makes four claims about why AES is not an accurate or appropriate method for assessing student writing:

1. “the ‘holistic score’ produced by AES is largely a function of the length of the essay.”
2. “the abnormal nature of the short timed impromptu writing test produces this strong correlation of length to score.”
3. “the metrics employed by programs like e-rater do not reflect the constructs they are supposed to measure.”
4. “the grammar checking and instructional function of e-rater and Criterion are much more limited than the much more developed functions in standard software such as MS Word, which itself has major limitations.” (128).

Like Perelman, Chris Anson argues that e-raters are not currently sophisticated enough to understand natural language, which means that computers are not yet able to interpret text as humans do, nor are they able to read words in their particular contexts. Anson also suggests that computers are not able to fact-check writing, though as Jeff pointed out in a post (that I cannot locate, for some reason), some readers are capable of fact-checking through database/internet searching. Taking the discussion in a slightly different direction, Bill Condon argues that AES is actually a red herring, and that we should instead be focusing on the construct of writing we’re actually assessing. Condon claims that human readers who are scoring timed-essays read the same way that computers that do, which demonstrates that AES is not the problem–our assessment practices are themselves the problem.

When I initially read these essays and participated in our discussion about them, I agreed that AES was problematic and that calibrated reading could be just as problematic. However, in reviewing these texts, I’ve been reminded of an issue that seem important to consider in relation to e-raters. I’ll focus on this for the remainder of this post.

In the spring semester, a student in my research-writing-in-the-disciplines class conducted research on and wrote an article about expert-layperson communication, focusing specifically on the Concept Revision Tool, which is a “Microsoft Access-based application designed to support written expert communication with laypersons. It is an adaptive tool that analyzes the text produced with regard to occurrences of specialist concepts stored in a database” (Jucks et al. 10). Another similar tool exists: the ANTWordProfiler, which uses the General Service List and the Academic Word List to identify uncommon words. Authors can use this tool to revise their text for particular audiences who might not be familiar with the words identified as “uncommon.” Finally, the Scientific Writing Assistant (SWAN) is a tool that asses “fluidity and cohesion” by identifying disconnected sentences within and across paragraphs, long sentences, and passive voice sentences.

While I wouldn’t advocate for the use of these tools for assessment purposes, I do think that these or similar types of tools could be used for the kind of formative assessment that Anson describes in his piece. Students and teachers could use these tools to think about revision, specially in terms of word choice and sentence/paragraph development. Based on the existence of these tools, I have a better understanding of why Anson, Condon, Broad, and others tentatively claim that AES might one day be feasible.

Advertisements

Our readings from the past few weeks have focused on evaluation and response, helping us to see the different types of response and/or assessments we might use in our classrooms–descriptive, evaluative, grading contract, rubric/scoring guide–and in our programs–directed self placement and print or electronic portfolios with or without rubrics/scoring guides. Because we’ve already worked together (publicly) to consider descriptive and evaluative feedback, grading contracts, and scoring guides, I’ll focus this post on considering how Ketai’s critique of DSP might also be applicable to other forms of response and evaluation.

In “Race, Remediation, and Readiness: Reassessing the ‘Self’ in Directed Self-Placement,” Rachel Lewis Ketai helps us to understand the potential problems with DSP, one of which is that placement guides construct students’ experiences with literacy as being entirely a matter of “personal choice” rather than recognizing the social and education contexts that contributed to these experiences (145). For instance, frequently reading newspapers and magazines is constructed as a personal choice (I could have read magazines and newspapers and just chose not to do so) rather than being recognized as the result of a complex of social relations and material conditions. Similarly, doing “much writing” in high school is constructed as personal choice rather than as the result of the kinds of assignments students got in the classroom. Ketai encourages us to consider how the student self is being constructed through the rhetorical choices we make in composing documents, such as the DSP placement guides. This is important, not just because of the effects this might have on students choices for placement and self-efficacy, but also because these kinds of programmatic/institutional documents act as “messenger[s] of institutional values to entering students” (141).

Ketai’s critique can also easily be applied to scoring guides and grading contracts. Following her approach, we might examine these documents to see how we are constructing student choice and agency in scoring guides. Dryer’s recent corpus analysis of scoring guides reveal that these documents frequently vacillate between attributing agency to students in higher performance categories and attributing textual agency in lower performance categories:

Agentive students disappear in lower performance categories. Where they are constructed as active agents in their writing, it is primarily as makers of decisions about style, use of evidence, and critical thinking. They are not routinely granted agentive control over their work on grammar, thesis, organization, or–curiously–their engagement with the assignment. (23; emphasis in original)

Ketai could help us to see the problems with attributing agency in some cases, while eliding it in others: not only does this have a–perhaps negative–effect on how the scorer approaches student work, but this document–assuming that students have access to it–could also affect how students perceive their own work. These kinds of rhetorical constructions seem to suggest that the student has the agency to produce a “good” text by making good decisions about style and organization, whereas a student has no agency agency when producing a “bad” text–the text is error-filled by no action on the student’s part.

This is where I think descriptive feedback has the potential to improve our assessments. Describing what you see in student work without taking an evaluative stance necessitates an understanding of student as agent–or at least, it should. (And of course, as Ketai points out, we must consider the student-as-agent as s/he is located in particular social contexts.) We talked in class about how descriptive feedback seems as though it would work only as formative assessment; summative assessments wouldn’t use descriptive feedback because they rely on readers’ making decisions about the work. I wonder, though, if descriptive response could have a place in summative assessments, like end-of-semester portfolios. Is there a way to describe what a text is doing and match those descriptions to a list of of outcomes? Or, more importantly, is there a way of developing outcomes that don’t rely on value-judgments, on what a text is or isn’t doing well? Is it possible to create a descriptive summative assessment rather than an evaluative summative assessment? Or is this idea too utopian to actually work?

Each of our readings on access, accessibility, and diversity has demonstrated the trouble that tacit assumptions make for fair writing assessments. For instance, Ball demonstrates through her study that a teacher’s experience and race affects how she perceives and responds to student work, just as the student’s race affects how teachers take up her work. Similarly, Haswell and Haswell demonstrate the complex effects of gender on the writing assessment process: male raters respond differently than female raters to male and female students, and female students respond differently to their peers work–depending on gender–than do their male counterparts. Most interestingly, all raters seem troubled by writing that is not easily identifiable as male or female–what Haswell and Hawell call “gender-switched” (414). Anson takes a slightly different approach by identifying the lack of research on race and diversity in WAC research and scholarship, which he attributes, in part, to the fact that WAC scholarship focuses on students as a “generalized construct” rather than on students as “individuals who bring specific histories, experiences, and ‘vernacular literacies’ to their learning” (23). 

This tension between student-as-generalized-constrcut and student-as-culturally-and-historically-situated-individual raises a couple of questions for me. First and foremost, I think that many (if not most) of us would agree that teachers must work toward understanding individual students and their various literacy and cultural practices. Considering these differences during formative classroom assessment is also crucial. However, it seems that once we move toward large scale assessment, whether at the programmatic level or state/national level, individual students begin to be viewed as a generalized construct–otherwise large scale assessment, as it is now, wouldn’t work. Not only are large scale assessments typically operating under an invalid construct of writing; they’re also operating under an invalid construct of student. 

Secondly, I’m reminded of Mary Sheridan-Rabideau’s bibliographic piece on writing, gender, and culture in Bazerman’s Handbook of Research on Writing. Sheridan-Rabideau raises a key question regarding the teaching of writing in relation to women:  “Is it more effective/better to prepare women to engage these agonistic academic structures as they are, or is it possible/better to create alternatives that might be more suited to women’s language practices?” (258). While I recognize the importance of this question in relation to women’s experiences of joining a community of practice dominated by the “masculinist communication style of academic writing” (257), I think this question is equally applicable to students from diverse racial and cultural backgrounds, to students who do not identify as male (or female), and so on. Ball seems to touch on this a bit when she draws on the work of Delpit to show that not being explicit and not “correcting” grammar can be just as, or possibly more harmful, than over-correcting students. Perhaps Delpit and/or Ball would say that we do need to prepare all students to write and engage in a society which values SWE and, if we’re talking about the academy, typically masculine modes of communication. If we accept that answer, then it seems to me that the implications for assessment are clear: we can continue to assess students based on a generalized construct of student because that construct is actually what we’re teaching towards. Large scale assessments can carry on. 

On the other hand, if we do value alternative modes of communication, and if we do view our students as culturally and historically situated individuals, then the implications are both more and less clear. Under these conditions, large scale assessment seems out of the question; how can you create individualized assessments on a national, state, or even programmatic level? Alternatively, can assessments be too individualized? Is some measure of standardization, or at least comparability, necessary? I continue to be drawn to Wardle and Roozen’s ecological model, which I think provides one viable alternative to the more traditional assessments we’ve discussed this semester. Though the model likely has flaws that I’m not currently able to see, I do think that it makes possible types of formative and summative assessment more focused on student-as-individual than student-as-generalized construct. Writing assessment scholars have clearly understood the necessity of local assessments for at least a decade; perhaps re-imagining the local as more student-based rather than place-based is the next logical step. 

One theme that seems to unify the texts we’ve read over the past two weeks is that of agency. In spite of the pessimism that many educators and writing theorists may feel when confronted with large-scale/mechanized writing assessments, Michael Neal reminds us that “we can and do exert a limited amount of control and agency in the development and use of technologies for writing or writing assessments” (56). Indeed, understanding that we do have agency is crucial if we are to design and implement local writing assessments that are guided by faculty values and expertise  and that respond to the specific needs of students within a program (Huot, Neal, Wardle and Roozen).

Though it may seem contradictive to the local to bring in Dylan Dryer’s recent research (summarized in the last frame of Workman and Marshall’s assessment timeline) that uses corpus analysis to investigate the construct of writing embedded in rubrics and grading guidelines from across the country, I think that this is actually a good starting place for thinking about how to implement Adler-Kassner and O’Neill’s advice for building alliances to instigate changes within assessment procedures. Dryer’s research shows that even though writing theorists seem to have made some impact on current writing assessments (welcome news after Behizadeh and Engelhard’s 2011 findings), the construct of writing guiding writing assessments still problematically diverges from the construct of writing that many of us teach in the classroom. Perhaps the most pressing problem is the local—or, rather, scoring guides’ failure to  situate writing within the local context. As Dryer points out, “scales tend to rhetorically construct local performance categories as universal descriptions” (5). For example, I’ve made the argument (in this poorly-designed Prezi) that the portfolio assessment rubric (PAR) (see below) used at the University of Maine fails to situate the traits of academic writing it describes within the context of ENG 101 classrooms at UMaine.

Portfolio Assessment Rubric AY2012

Many students’ portfolio reflections conclude by making grand claims about being able to write “in any situation” now that they have completed ENG 101. Now, part of this trend could be explained by the prevalence of the “narrative of progress” (Emmons, Scott) in students’ reflections—of course they want portfolio readers to think that they have grown exponentially as writers, students, and human beings—but I would also argue that the PAR’s presentation of academic writing abilities as general leads students to make these claims. My argument rests, in part, on the experience I had reading my students’ reflections after they spent a semester developing their own theories of writing that were then used to critique the construct of writing inherent in the PAR. Many students ended up arguing that what they learned in ENG 101 would necessarily need to be recontextualized for writing in other contexts—an argument that I do not think they would have made within viewing the PAR through a critical lens.

Returning to my original point—I think the discrepancy between writing constructs is an important starting place for carrying out the work of building alliances. Once we’re able to see how, specifically, our teaching and our assessments don’t align, we can begin reaching out to others using Adler-Kassner and O’Neill’s advice for issue-based alliance building. A great example of how this might work can be found in Wardle and Roozen’s formulation of an ecological model of writing assessment. Wardle and Roozen’s approach aims to bring the construct of writing present in writing assessment into alignment with the construct of writing taught in the classroom. Their model allows for a more robust understanding of writing as situated, as spanning disciplinary and non/academic boundaries, as integral to students’ understandings of self. Wardle and Roozen’s focus on building alliances with various institutional entities—writing centers, WAC programs, etc—over a span of many years. Just as Adler-Kassner and O’Neill point out that administrators must have their eye on long-term changes, Wardle and Roozen encourage their readers to consider small changes that can be implemented over time. They also emphasize in the local in cautioning readers not to assume that what has worked at UCF will necessarily work elsewhere:  each institution requires a different approach.

Neal encourages educators to consider what writing, teaching, and writing assessment might look like 5, 10, or 20 years in the future (46-47), and I think that one way of responding to this question is the ecological model of assessment. Not only does this model fit some of the descriptions we developed in class of what writing might look like—less situated within one discipline, more focused on professional development—I think it also allows for, and perhaps even depends on, revisions made over the course of time. In this way I think that the ecological model is better suited to the field’s current construct of writing, while also being adaptable and responsive to changes in writing studies and education at large.

Additional Resources

Dryer, Dylan. “Scaling Writing Ability: A Corpus Driven Inquiry.” Written Communication 30.1 (2013): 3-35. Web. 24 September 2013.

Emmons, Kimberly. “Rethinking Genres of Reflection:  Student Portfolio Cover Letters and the             Narrative of Progress.” Composition Studies 31.1 (2003): 43-62. Print.

Reading about validity and reliability issues this week has led me to reflect on the one high-stakes assessment in which I regularly participated for four years: an end-of-semester portfolio assessment for FYC students. I’ll briefly provide a description of the assessment so that my reflections can be situated within that specific context.

At my previous institution, students were required to complete one semester of FYC or its equivalent, which they could do by making a 3 on the AP exam, transferring credits from elsewhere (though credits did not often transfer), or completing a “challenge exam” to demonstrate that they already met the ENG 101 course outcomes. Each semester we had around 30 sections of ENG 101, mostly taught by graduate teaching assistants and part-time faculty. Two TT faculty members also regularly taught the course. Three programmatic documents helped to unify the sections of 101: the challenge exam prompt, the portfolio assessment rubric (PAR), and the appeal prompt. Most important for my discussion is the PAR, which was designed for use in fall of 2009 (the year I began teaching). The PAR outlined the traits of academic writing that students were expected to demonstrate, and it asked readers to make a holistic pass/fail decision based on these categories. We were encouraged not to use the traits as a check list, but rather to consider these aspects of academic writing and to make a holistic decision about a student’s demonstration of these traits. To demonstrate their abilities, students submitted a portfolio containing two academic essays and reflective introduction.* These portfolios were read by two teachers who were not the instructor of record for the student, and in the event of a disagreement between the two readers, a third reader who was not aware of the previous readers’ decisions assessed the work.

We participated in calibration sessions during the semester and on the morning of each day of the portfolio reading, which lasted three days. A point of contention that repeatedly arose between two of the officials responsible for designing the assessment had to do with framing the question of whether to pass/fail a portfolio. Professor X asked us to consider whether the student would benefit from another semester of ENG 101. Professor Y insisted that asking readers to make that decision was unfair; readers couldn’t possibly know whether students would benefit from another semester, but they could determine whether the work they were assessing met the course goals. The point of contention between these two professors seems to be related to issues of test use. The portfolio was being used to sort students out of ENG 101 or into another semester of ENG 101, and, drawing from Shepherd’s argument, that seems to be an important point for readers to consider. Professor Y was putting forward the position described by Shepherd: “They worry that addressing consequences will overburden the concept of validity or overwork test makers” (5). Professor Y’s concern was that asking the readers to consider their decision in terms of the enormous impact on students would prevent them from making sound decisions. I wonder, though, if thinking in terms of the consequences for students would necessarily prevent us from making accurate and appropriate decisions.

For instance, one semester I had a student who had already failed the portfolio review and was taking ENG 101 for a second time. Failing the assessment had a negative impact on her self-efficacy and her willingness to engage in reflective writing practices. After submitting a portfolio of her work in my course, she again failed the assessment. Both the student and I appealed the readers’ decisions, but the appeals committee would not overturn the decision. The student was faced with taking a third semester of ENG 101, despite the fact that her reflective introduction clearly demonstrated a growing reflective understanding of writing and her own writing processes. Needless to say, the student’s negative feelings about writing only intensified after this experience.

In this particular instance, would making the student complete a third semester of ENG 101 really be beneficial? Should the readers have been made aware of this specific student’s situation, or would that have been detrimental to the decision-making process, as Professor Y suggested? Given that the assessment purported to measure reflective awareness, did reliability trump validity? Is this also evidence that the assessment suffered from “construct underrepresentation”?

Moments like this one are clearly crucial for opening up reflective spaces for participants in the assessment process. As Moss points out, critically reflecting on test use and design is crucial so that stakeholders’ “perspectives…can be reaffirmed with a new self-consciousness or enabled to evolve” (156). However, the particular situation of this one student was not clearly visible to anyone other than the student and me. Because the assessment process was not designed to “capture” this information, it could not be brought to light except by the student’s willingness to speak to officials responsible for constructing and conducting the assessment. It didn’t occur to me at the time, but now I’m wondering how that particular assessment process could be revised to recognize the types of experiences non-mainstream students had in FYC courses and with the assessment. Or, as Moss asks, “[t]o what extent is the writing program complicit in simply reproducing a narrow model of academic writing (and the understanding of knowledge it entails) without providing opportunity for the values implicit in the model to be illuminated and self-consciously considered?” (157). Considering the assessment through the lens of this student’s experience does seem to indicate that the writing program reproduced a narrow model of academic writing, and one that differed from the construct of writing the department claimed to value. On the one hand, we taught students about the importance and value of reflection for writing processes, and on the other, we based our assessments almost solely on a narrow conception of academic writing as separable from students’ metacognitive awareness of writing.

I’m not sure that I’ve arrived at any sort of conclusion here, but I suppose that I’m left with a question about assessment design. How can we ensure that reflective spaces are opened up within the assessment process? How could (or should) experiences like the one my student had be made visible to those responsible for (re)designing the assessment? 

*I completed a study in spring 2011 on readers’ expectations for this reflective component of the portfolio and revised the PAR to include criteria for the “critical reflection.” This revision enabled students to understand how readers were assessing their reflections and provided the basis for classroom discussions of reflective writing. Prior to my work, students did not have access to assessment criteria for the reflection, teachers did not have a unified understanding of what readers wanted, and readers had no concrete criteria by which to assess reflections. As a result, reflections were often undervalued in the assessment despite the emphasis placed on reflection in multiple ENG 101 classrooms.

One common thread that I have traced throughout several histories of writing assessment is an emphasis on continual movement among a number of issues, including, but not limited to, the following: reliability and validity, teachers’ agency (and in Yancey’s account, students’ agency) and measurement theorists’ authority, direct and indirect measures, and “quality” education and cost-effectiveness. Yancey uses a wave metaphor to describe these shifts, explaining that “[o]ne way to historicize those changes [in writing assessment] is to think of them as occurring in overlapping waves, with one wave feeding into another but without completely displacing waves that came before” (131). Similarly, Behizadeh and Engelhard differentiate between dominant and emergent traditions and practices in measurement theory, writing theory, and writing assessment practice (and theory) (190), showing that traditions and practices have moved from dominant to emergent and back again over the last 100 years.

Both of these histories invoke (for me) Raymond Williams’ methodology of historical analysis, “in which a sense of movement within what is ordinarily abstracted as a system is crucially necessary, especially if it is to connect with the future as well as with the past” (121). Though Behizadeh and Engelhard are using Williams’ terminology—“dominant” and “emergent”—Yancey is not, yet her history seems to trace the existence of dominant, emergent and residual beliefs and practices in writing assessment. Particularly, the use of indirect writing assessment, with its focus on surface features of language divorced from meaningful context/s—and, relatedly, remedial instruction for students who do not fare well on these tests—appears to be a residual practice from a time period pre-dating any theoretical attention to the concept of validity (O’Neill, Moore, and Huot 23).

Given compositionists and writing assessment theorists’ (Behizadeh and Engelhard) emphasis on validity as a key component of sound assessment, one might wonder how invalid constructs continue to underpin high-stakes and standardized testing. This disconnect is where Williams’ notion of the residual may be most helpful: “[C]ertain experiences, meanings, and values which cannot be expressed or substantially verified in terms of the dominant culture, are nevertheless lived and practiced on the basis of the residue—cultural as well as social—of some previous social and cultural institution or formation” (122). I think that this definition helps us to account for residual practices of those working within the fields of composition and writing assessment, but I’m not certain that it accounts for the national trend. As all six histories—particularly Behizadeh and Engelhard’s—reveal, writing theorists and teachers have had little influence on dominant practices in writing assessment. Though—from the viewpoint of compositionists and writing theorists—some assessment practices may seem founded on residual beliefs, to dominant testing culture, these practices are, perniciously, “simply ‘the way things are done’” (Paré 112).

Behizadeh, Nadia, and George Engelhard Jr. “Historical View of the Influences of Measurement and Writing Theories on the Practice of Writing Assessment in the United States.” Assessing Writing 16 (2011): 189-211.

O’Neill, Peggy, Cindy Moore, and Brian Huot. “Historicizing Writing Assessment.” A
Guide to College Writing Assessment. Logan: Utah State UP, 2009. 14-34.

Paré, Anthony. “Discourse Regulations and the Production of Knowledge.” Writing in
The Workplace: New Research Perspectives. Ed. Rachel Spilka. Carbondale:
Southern Illinois UP, 1998.

Williams, Raymond. Marxism and Literature. New York: Oxford UP, 1977.

Yancey, Kathleen Blake. “Looking Back as We Look Forward: Hisoricizing Writing Assessment.” Assessing Writing: A Critical Sourcebook. Eds. Brian Huot and Peggy O’Neill. Boston: Bedford / St. Martin’s, 2009. 131-149.