Assessment in an AI-Ubiquitous World
Emerging Principles, Recent Research, and Examples
A Northeastern Assessment and AI Task Force Briefing Document
Five Emerging Principles of AI Assessment
1. Transparency and clear communication are essential: Articulate clear, consistently labeled AI expectations for each assignment. Explain the purpose of assignments and why key skills are important to master independently. Integrate discussion of expectations throughout the course.
2. Focus grading on aspects of student work that can’t be outsourced to AI: Detection tools are fallible, can be biased, and a detection-centric approach erodes student-instructor trust. Focus on the design of assessments that are aligned with learning outcomes and provide valid evidence of student work. What needs to be witnessed and what does not?
3. Evidence of student process increases validity of assessment: Integrate iterative deliverables, with rounds of peer and instructor feedback as students create their work (e.g., drafts, chat logs, revision histories, and reflections). Students submit this collection of evidence, not just the final document, providing a holistic record of how they developed their work and responded to personalized feedback.
4. Include evaluative judgment as a core capability: Have students share critiques to assess their ability to judge the quality of work produced without, with, and by AI.
5. Systematic, curriculum-level assessment revision is most effective: The magnitude of AI calls for systematic, program-level assessment support. This includes revising course assignments to ensure valid assessment, consistent messaging, and scaffolded development of AI proficiency and integrity over time.
Introduction: Assessment at an Inflection Point
Assessment in higher education is at an inflection point in the wake of Generative AI. Prior to widespread availability of GenAI, educators could reasonably assume that text and media authorship required complex thought, discernment, and often creativity on the part of students. Aside from exams, this work was often done outside of class. However, recent research demonstrates that even experienced graders cannot reliably distinguish AI-generated work from work that students performed independent of AI in authentic assessment tasks (Kofinas et al., 2025), confirming that AI capability now matches or exceeds typical student performance across most types of assessment. In addition, detection tools are not reliable, especially on work produced by multilingual learners (Popkov & Barrett, 2025, Shamsi, 2026).
In 2023, as the GenAI meteor hit higher education, the initial focus was on cheating. However, if GenAI agentic browsers such as Comet can complete an assignment in learning management systems such as Canvas without any student involvement, one could argue that it is our system for assessment that is broken (Mills, 2025). GenAI is forcing us to address fundamental questions about what we assess and how we validate learning.
This brief organizes current thinking about Assessment and AI around five core principles emerging from recent research and notable institution-wide efforts. These principles provide a foundation for approaching assessment redesign not as crisis management but as intentional pedagogical evolution.
Principle 1: Transparency and Clear Communication Are Essential
Foundational Insight
Research by Corbin et al. (2025) reveals widespread confusion about acceptable AI use. Students report constructing individually unique ethical frameworks, putting them in untenable and uncomfortable positions, because institutional guidance is absent or ambiguous. Faculty experience significant emotional burden attempting to communicate unclear boundaries. This ambiguity creates anxiety for students trying to comply and provides cover for those attempting to circumvent learning.
Implications for Practice
Transparency in Learning and Teaching (TILT) provides a framework for describing the assignment’s purpose, task, and criteria for assessment. Empirical studies have demonstrated significant gains in student metacognition, confidence, persistence, and employer-valued skills (Winklemes, 2025, 2013).
Explicit Graduated Frameworks such as the AI Assessment Scale describe levels of AI integration:
1. No AI: Assessment without AI assistance (controlled environment)
2. AI for Planning: AI for brainstorming/outlining; human-authored final work
3. AI for Editing: AI improves clarity/grammar of human-drafted text
4. AI Collaboration: AI generates content; student critiques, verifies, modifies
5. AI Exploration: Extensive AI use; assessment focuses on process (Perkins et al., 2024)
Faculty select the appropriate level for each assessment based on learning outcomes and communicate this explicitly to students via syllabus statements, assignment instructions, and classroom discussion. While helpful in clarifying expectations in a way that supports student success, this system does not address the question of what will be graded and how that work will be assessed.
Example: University of Michigan
Michigan developed standardized syllabus templates incorporating graduated AI policies. Critically, templates prompt faculty to articulate the pedagogical rationale for the chosen policy (e.g., “AI is prohibited on this assignment because you need to develop independent analytical skills before learning to collaborate with AI tools”). This transparency helps students understand how policies serve learning goals, not arbitrary control.
Key Vocabulary
- AI Assessment Scale: Leveled framework specifying permitted AI integration
- Pedagogical Rationale: Explanation of how AI policy serves learning objectives
- AI Acknowledgment: Student documentation of AI tools used and nature of use
Principle 2: Focus Grading on Aspects of the Work that Can’t Be Outsourced to AI
The Foundational Insight
Dawson et al. (2024) argue that framing AI as primarily an academic integrity issue misses the deeper challenge: assessment validity. When students can submit AI-generated work that appears to demonstrate competence, the assessment fails to validly measure what students know and can do. They argue that validity should take precedence over detection. This is a measurement problem before it is a moral problem.
The shift from “Are students cheating?” to “Does this assessment measure what it claims to measure?” transforms institutional responses. Rather than investing in AI detection tools, institutions should invest in redesigning assessments to ensure validity (Corbin et al., 2025).
Implications for Practice
The Two-Lane Approach to validity-centered design recognizes that different learning outcomes require different assessment approaches:
1. Assured, Foundational Assessment: For foundational knowledge and skills that must be verified as human capability, use supervised environments (proctored exams, oral presentations, in-class demonstrations)
2. Integrated, Relevance-Focused Assessment: For professional capabilities involving tool use, design assessments where AI engagement is transparent and the human contribution is measurable
This distinction, formalized in the Two-Lane model developed by the University of Sydney and adopted by Australia’s Tertiary Education Quality and Standards Agency (TEQSA), ensures that degrees certify both independent capability and skilled tool use (Lodge et al., 2023, Bridgeman and Liu, 2025). However, critics of this approach note that it is increasingly challenging to secure even in-person assessment, noting the importance of trust-building with students (Curtis, 2025).
Example: University of Sydney
Sydney conducts program-level audits categorizing each assessment as Lane 1 (assured/supervised) or Lane 2 (integrated/open). Core content assessments—the “non-negotiables” of the discipline—move to Lane 1 to ensure graduates possess foundational capabilities independently. Professional practice assessments move to Lane 2 with explicit instruction on responsible AI use, preparing students for AI-integrated workplaces.
Key Vocabulary
- Construct Validity: The degree to which an assessment measures the intended learning outcome
- Assured Assessment: Supervised tasks verifying independent human capability
- Integrated Assessment: Open tasks teaching responsible AI collaboration and complex tasks over time
Principle 3: Evidence of Student Process Increases Validity of Assessment
Foundational Insight
When AI can generate polished final products instantly, assessing only completed work fails to capture student learning. Effective assessment must capture the process of work—the decisions, iterations, and thinking that produced the outcome. This shift from product to process makes the cognitive engagement of learning visible and assessable (Kickbusch et al., 2025). In The Next Era of Assessment, the Digital Education Council outlines “key touchpoints” in AI-free, AI-Assisted, and AI-integrated assessment designs (2025). While process-oriented approaches create opportunities for reflection on learning and decrease the likelihood of counter-productive use of AI, they are difficult to scale in large format courses. However, fields such as health science have developed approaches such as simulations that are designed for scaled assessment of the process-oriented proficiencies that students need to demonstrate.
Implications for Practice
Document and Reflect on Progress Break large assignments down into interim steps with deliverables and feedback. Students submit an audit trail of their work process alongside final products.
- Draft versions showing evolution of thinking
- Revision histories (e.g., Google Docs edit history)
- Student-authored memos explaining how peer and instructor feedback was acted upon in revisions
- Decision logs explaining choices made
- Chat transcripts if AI was used
Example: Deakin University
The Centre for Research in Assessment and Digital Learning at Deakin University developed “reverse scaffolding”: students may only use AI for tasks they have first demonstrated independent capability to perform (Deakin University, n.d.). For example, students must complete an initial essay without AI assistance to demonstrate baseline writing capability, then may use AI for subsequent essays while documenting their process. This preserves validity while allowing progressive skill development.
Key Vocabulary
- Process Evidence: Documentation of work evolution and decision-making
- Audit Trail: Collection of artifacts showing how the final product was created
- Reflective Integration: Explicit articulation of how AI was used and why
Principle 4: Include Evaluative Judgment as a Core Capability
Foundational Insight
Bearman et al. (2024) argue that as content generation becomes automated, the essential human capability is evaluative judgment: the ability to recognize quality of work produced without, with, and by AI. Graduates need to assess AI outputs for accuracy, appropriateness, bias, and quality rather than accepting algorithmic suggestions uncritically.
This represents a fundamental shift in what universities certify. Previously, we certified students’ ability to produce knowledge artifacts. Increasingly, we must certify their ability to judge the quality of knowledge artifacts regardless of production source. They also need to internalize a sense of accountability for the quality and accuracy of a final product. Of Note: AI is increasingly capable of evaluating the quality of its work. Strategies such as in-person presentations with question and answer may be useful in validating student evaluation skills.
Implications for Practice
Design Critique Assignments Students assess AI-generated work before producing and critiquing their own work:
- Provide students with an AI-generated essay/code/analysis
- Ask students to evaluate it using course rubrics
- Require identification of strengths, weaknesses, errors, hallucinations
- Have students improve the work to professional standards
- Assign a similar project in which they apply this judgment to work that they produce
This requires deeper engagement and exposes AI limitations (generic thinking, lack of nuance, hallucinations) in ways that build critical AI literacy. As with process-oriented practices, this approach can be difficult to scale.
Example: Oregon State University
The MAGE Framework maps “Distinctive Human Skills” against “GenAI Supplementation” at each level (Zaphir et. al, 2024). They help faculty elevate assessments from lower levels (Remember, Understand) where AI excels to higher levels (Evaluate, Create) requiring human capabilities like ethical reasoning, contextual judgment, and integration of lived experience that AI at present cannot replicate.
Key Vocabulary
- Evaluative Judgment: Capacity to recognize quality in one’s own and others’ work
- AI Literacy: Understanding of AI capabilities, limitations, responsible use, and appropriate use contexts
Principle 5: Systematic, Curriculum-Level Assessment Revision is Most Effective
Foundational Insight
Lodge et al. (2023) argue that isolated assignment redesigns are insufficient; programmatic approaches are more robust than one-off course revisions. When AI can complete many traditional tasks, validity threats require coordinated responses across entire programs. Some refer to this as the “Swiss cheese” approach to authentic assessment, because potential vulnerabilities in discrete assessments are staggered over the span of student work in a program (King’s College, n.d.). Australia’s TESQA framework requires higher education providers to submit action plans demonstrating program-level assessment assurance in the AI era (TESQA, n.d.). Georgetown University’s AI Lab, sponsored by the Center for New Designs in Learning and Scholarship, provides voluntary, non-prescriptive program-level support to help administrators and educators explore how AI affects their field, their students, and their curriculum (CNDLS, n.d.).
Effective practice involves program-level assessment mapping that ensures:
- Foundational competencies are verified across courses through assured assessment
- AI skills are scaffolded progressively from first year to graduation
- Learning outcomes requiring independent demonstration are distinguished from those involving tool collaboration
- Assessment methods are varied (triangulation) to build comprehensive validity arguments and increase the ability to know how students or don’t use AI across time
Implications for Practice
Map assessments across programs to identify
- Which learning outcomes require independent human capability verification (Lane 1)
- Which outcomes involve contemporary professional tool use (Lane 2)
- Where AI skills are explicitly taught and practiced
- Whether assessment diversity provides sufficient validity evidence
Example: Monash University
Monash University’s Programmatic Assessment and AI Review (PAAIR) project is systematically reviewing every course’s assessment across the institution (Monash University, n.d.). The project coordinates program-level redesign ensuring:
- AI skills are scaffolded across degree progression
- High-stakes assessments in capstone assignments verify core capabilities
- Students receive consistent messaging about AI expectations across courses
- Faculty collaborate on coherent assessment ecosystems rather than working in isolation
Key Vocabulary
- Program Assessment Mapping: Systematic examination of assessment across entire degree
- Triangulation: Using multiple assessment methods to build comprehensive validity evidence
Emerging Questions
Student Anxiety and Wellbeing
Contrary to assumptions that students universally want to minimize effort through AI use, research reveals complex sentiments. Many students actively avoid AI when permitted because they fear it will impair their learning and employability. Others experience anxiety about unclear boundaries and potential accidental violations (Corbin et al., 2025). Ethical concerns about AI’s impact on the environment and the impact of data centers in rural and low-income settings are also on the rise.
Disciplinary Variation
AI impact varies significantly by discipline requiring tailored approaches to assessment:
- Computer Science: Moving from “write code” in foundational courses to “debug and review code” as students gain expertise
- Humanities and Creative Arts: Rethinking traditional assignments such as essays and prototypes
- Health Professions: Renewed emphasis on objective structured clinical examinations (OSCEs)
Cultural and Algorithmic Bias
AI models trained predominantly on Western, English-language data may disadvantage students working on Indigenous knowledge, non-Western contexts, or in languages other than English. Assessment design must explicitly consider whether AI integration could create unfair advantages or disadvantages for particular student populations.
Conclusion: From Disruption to Intentional Design
AI has disrupted assessment practice, but disruption creates opportunity for intentional redesign. The five principles outlined here provide a foundation:
- Transparency through frameworks: Communicate clear, graduated expectations
- Validity over detection: Focus resources on measurement quality, not surveillance
- Process over product: Make learning visible through documentation and reflection
- Evaluative judgment as capability: Assess students’ ability to judge AI output quality
- Programmatic coordination: Design coherent assessment ecosystems, not isolated fixes
References
Bearman, M., Tai, J., Dawson, P., Boud, D., & Ajjawi, R. (2024). Developing evaluative judgement for a time of generative artificial intelligence. Assessment & Evaluation in Higher Education, 1–13. https://doi.org/10.1080/02602938.2024.2335321
Bridgeman, A., & Liu, D. (2025, October 2). Two parallel lanes: the roadmap for a future-ready transformative education. Teaching@Sydney. https://educational-innovation.sydney.edu.au/teaching@sydney/two-parallel-lanes-the-roadmap-for-a-future-ready-transformative-education/
Center for New Designs in Learning and Scholarship. (n.d.). CNDLS AI Lab. Georgetown University. https://cndls.georgetown.edu/programs/ai-lab/
Corbin, T., Dawson, P., Nicola-Richmond, K., & Partridge, H. (2025). ‘Where’s the line? It’s an absurd line’: Towards a framework for acceptable uses of AI in assessment. Assessment & Evaluation in Higher Education, 50, https://www.tandfonline.com/doi/full/10.1080/02602938.2025.2456207.
Curtis, G. J. (2025). The two-lane road to hell is paved with good intentions: why an all-or-none approach to generative AI, integrity, and assessment is insupportable. Higher Education Research & Development, 44(8), 2151–2158. https://doi.org/10.1080/07294360.2025.2476516.
Dawson, P., Bearman, M., Dollinger, M., & Boud, D. (2024). Validity matters more than cheating. Assessment & Evaluation in Higher Education. https://doi.org/10.1080/02602938.2024.2388285
Deakin University. (n.d.). Centre for Research in Assessment and Digital Learning. https://www.deakin.edu.au/about-deakin/why-deakin/education-excellence/cradle
Digital Education Council. (2025, July 7). The next era of assessment: A global review of AI in assessment design. Digital Education Council. https://www.digitaleducationcouncil.com/post/the-next-era-of-assessment-a-global-review-of-ai-in-assessment-design
Kickbusch, S., Ashford-Rowe, K., Kemp, A., Specht, M., Bartolic, S., & Lodge, J. M. (2025). Beyond detection: Redesigning authentic assessment in an AI-mediated world. Education Sciences, 15(11), 1537. https://doi.org/10.3390/educsci15111537
King’s College London. (n.d.). Authentic assessment at the module and programme level. https://www.kcl.ac.uk/about/strategy/learning-and-teaching/ai-guidance/approaches-to-assessment/authentic-assessment
Kofinas, A. K., Tsay, C. H., & Pike, D. (2025). The impact of generative AI on academic integrity of authentic assessments in higher education. British Journal of Educational Technology. https://doi.org/10.1111/bjet.13551
Lodge, J. M., Howard, S., Bearman, M., Dawson, P., & Associates. (2023). Assessment reform for the age of artificial intelligence. Tertiary Education Quality and Standards Agency. https://www.teqsa.gov.au/guides-resources/resources/corporate-publications/assessment-reform-age-artificial-intelligence
Mills, A. (2025, October 19). The time to reckon with AI agents in digital learning spaces is now. Anna Mills’ Substack. https://annamills.substack.com/p/the-time-to-reckon-with-ai-agents
Monash University. (n.d.). Programmatic Assessment and AI Review (PAAIR). TeachHQ. https://www.monash.edu/learning-teaching/TeachHQ/Assessment/PAAIR
Perkins, M., Furze, L., Roe, J., & MacVaugh, J. (2024). The AI Assessment Scale (AIAS): A framework for ethical integration of generative AI in educational assessment. Computers and Education: Artificial Intelligence. https://doi.org/10.1016/j.caeai.2024.100259
Popkov, A. A., & Barrett, T. S. (2025). AI vs academia: Experimental study on AI text detectors’ accuracy in behavioral health academic writing. Accountability in Research, 32(7), 1072–1088. https://doi.org/10.1080/08989621.2024.2331757
Shamsi, A., Wang, T., Amraei, M., & Raju, N. V. (2026). Evaluating AI text detection tools for distinguishing human-written from AI-generated abstracts in Persian-language journals of library and information science. Acta Informatica Pragensia, 15(1), 126–134. https://doi.org/10.18267/j.aip.293
Tertiary Education Quality and Standards Agency. (n.d.). Gen AI – TEQSA resources. https://www.teqsa.gov.au/guides-resources/higher-education-good-practice-hub/gen-ai-knowledge-hub/gen-ai-teqsa-resources
Winklemes, M. (2025, October 27). TILTing the use of AI to reduce its risks. The Teaching Professor. https://www.teachingprofessor.com/topics/teaching-strategies/tilting-the-use-of-ai-to-reduce-its-risks
Winkelmes, M. A. (2013). Transparency in teaching: Faculty share data and improve students’ learning. Liberal Education, 99(2), 48-55. https://dgmg81phhvh63.cloudfront.net/content/magazines/Archive/LE_SP13_Vol99No2.pdf
Zaphir, L., Lodge,J, J Lisec, J. McGrath, D., Khosravi H. (2024). How critically can an AI think? A framework for evaluating the quality of thinking of generative artificial intelligence. ArXiv. https://arxiv.org/abs/2406.14769