Assessment in an AI-Ubiquitous World | Center for Advancing Teaching and Learning Through Research

Emerging Principles, Recent Research, and Examples

A Northeastern Assessment and AI Task Force Briefing Document

NOTE: The content on this page was developed on background to inform the work of the Assessment and AI Task Force, which produced Assessment in an AI-Enabled World: A Faculty Guide, which is hosted on the Office of the Provost website and is the final product of the Task Force work,

Five Emerging Principles of AI Assessment

1. Transparency and clear communication are essential: Articulate clear, consistently labeled AI expectations for each assignment. Explain the purpose of assignments and why key skills are important to master independently. Integrate discussion of expectations throughout the course.

2. Focus grading on aspects of student work that can’t be outsourced to AI: Detection tools are fallible, can be biased, and a detection-centric approach erodes student-instructor trust. Focus on the design of assessments that are aligned with learning outcomes and provide valid evidence of student work. What needs to be witnessed and what does not?

3. Evidence of student process increases validity of assessment: Integrate iterative deliverables, with rounds of peer and instructor feedback as students create their work (e.g., drafts, chat logs, revision histories, and reflections). Students submit this collection of evidence, not just the final document, providing a holistic record of how they developed their work and responded to personalized feedback.

4. Include evaluative judgment as a core capability: Have students share critiques to assess their ability to judge the quality of work produced without, with, and by AI.

5. Systematic, curriculum-level assessment revision is most effective: The magnitude of AI calls for systematic, program-level assessment support. This includes revising course assignments to ensure valid assessment, consistent messaging, and scaffolded development of AI proficiency and integrity over time.

Introduction: Assessment at an Inflection Point

Assessment in higher education is at an inflection point in the wake of Generative AI. Prior to widespread availability of GenAI, educators could reasonably assume that text and media authorship required complex thought, discernment, and often creativity on the part of students. Aside from exams, this work was often done outside of class. However, recent research demonstrates that even experienced graders cannot reliably distinguish AI-generated work from work that students performed independent of AI in authentic assessment tasks (Kofinas et al., 2025), confirming that AI capability now matches or exceeds typical student performance across most types of assessment. In addition, detection tools are not reliable, especially on work produced by multilingual learners (Popkov & Barrett, 2025, Shamsi, 2026).

In 2023, as the GenAI meteor hit higher education, the initial focus was on cheating. However, if GenAI agentic browsers such as Comet can complete an assignment in learning management systems such as Canvas without any student involvement, one could argue that it is our system for assessment that is broken (Mills, 2025). GenAI is forcing us to address fundamental questions about what we assess and how we validate learning.

This brief organizes current thinking about Assessment and AI around five core principles emerging from recent research and notable institution-wide efforts. These principles provide a foundation for approaching assessment redesign not as crisis management but as intentional pedagogical evolution.

Principle 1: Transparency and Clear Communication Are Essential

Foundational Insight

Research by Corbin et al. (2025) reveals widespread confusion about acceptable AI use. Students report constructing individually unique ethical frameworks, putting them in untenable and uncomfortable positions, because institutional guidance is absent or ambiguous. Faculty experience significant emotional burden attempting to communicate unclear boundaries. This ambiguity creates anxiety for students trying to comply and provides cover for those attempting to circumvent learning.

Implications for Practice

Transparency in Learning and Teaching (TILT) provides a framework for describing the assignment’s purpose, task, and criteria for assessment. Empirical studies have demonstrated significant gains in student metacognition, confidence, persistence, and employer-valued skills (Winklemes, 2025, 2013).

Explicit Graduated Frameworks such as the AI Assessment Scale describe levels of AI integration:

1. No AI: Assessment without AI assistance (controlled environment)

2. AI for Planning: AI for brainstorming/outlining; human-authored final work

3. AI for Editing: AI improves clarity/grammar of human-drafted text

4. AI Collaboration: AI generates content; student critiques, verifies, modifies

5. AI Exploration: Extensive AI use; assessment focuses on process (Perkins et al., 2024)

Faculty select the appropriate level for each assessment based on learning outcomes and communicate this explicitly to students via syllabus statements, assignment instructions, and classroom discussion. While helpful in clarifying expectations in a way that supports student success, this system does not address the question of what will be graded and how that work will be assessed.

Example: University of Michigan

Michigan developed standardized syllabus templates incorporating graduated AI policies. Critically, templates prompt faculty to articulate the pedagogical rationale for the chosen policy (e.g., “AI is prohibited on this assignment because you need to develop independent analytical skills before learning to collaborate with AI tools”). This transparency helps students understand how policies serve learning goals, not arbitrary control.

Key Vocabulary

AI Assessment Scale: Leveled framework specifying permitted AI integration
Pedagogical Rationale: Explanation of how AI policy serves learning objectives
AI Acknowledgment: Student documentation of AI tools used and nature of use

return to top

Principle 2: Focus Grading on Aspects of the Work that Can’t Be Outsourced to AI

The Foundational Insight

Dawson et al. (2024) argue that framing AI as primarily an academic integrity issue misses the deeper challenge: assessment validity. When students can submit AI-generated work that appears to demonstrate competence, the assessment fails to validly measure what students know and can do. They argue that validity should take precedence over detection. This is a measurement problem before it is a moral problem.

The shift from “Are students cheating?” to “Does this assessment measure what it claims to measure?” transforms institutional responses. Rather than investing in AI detection tools, institutions should invest in redesigning assessments to ensure validity (Corbin et al., 2025).

Implications for Practice

The Two-Lane Approach to validity-centered design recognizes that different learning outcomes require different assessment approaches:

1. Assured, Foundational Assessment: For foundational knowledge and skills that must be verified as human capability, use supervised environments (proctored exams, oral presentations, in-class demonstrations)

2. Integrated, Relevance-Focused Assessment: For professional capabilities involving tool use, design assessments where AI engagement is transparent and the human contribution is measurable

This distinction, formalized in the Two-Lane model developed by the University of Sydney and adopted by Australia’s Tertiary Education Quality and Standards Agency (TEQSA), ensures that degrees certify both independent capability and skilled tool use (Lodge et al., 2023, Bridgeman and Liu, 2025). However, critics of this approach note that it is increasingly challenging to secure even in-person assessment, noting the importance of trust-building with students (Curtis, 2025).

Example: University of Sydney

Sydney conducts program-level audits categorizing each assessment as Lane 1 (assured/supervised) or Lane 2 (integrated/open). Core content assessments—the “non-negotiables” of the discipline—move to Lane 1 to ensure graduates possess foundational capabilities independently. Professional practice assessments move to Lane 2 with explicit instruction on responsible AI use, preparing students for AI-integrated workplaces.

Key Vocabulary

Construct Validity: The degree to which an assessment measures the intended learning outcome
Assured Assessment: Supervised tasks verifying independent human capability
Integrated Assessment: Open tasks teaching responsible AI collaboration and complex tasks over time

return to top

Principle 3: Evidence of Student Process Increases Validity of Assessment

Foundational Insight

When AI can generate polished final products instantly, assessing only completed work fails to capture student learning. Effective assessment must capture the process of work—the decisions, iterations, and thinking that produced the outcome. This shift from product to process makes the cognitive engagement of learning visible and assessable (Kickbusch et al., 2025). In The Next Era of Assessment, the Digital Education Council outlines “key touchpoints” in AI-free, AI-Assisted, and AI-integrated assessment designs (2025). While process-oriented approaches create opportunities for reflection on learning and decrease the likelihood of counter-productive use of AI, they are difficult to scale in large format courses. However, fields such as health science have developed approaches such as simulations that are designed for scaled assessment of the process-oriented proficiencies that students need to demonstrate.

Implications for Practice

Document and Reflect on Progress Break large assignments down into interim steps with deliverables and feedback. Students submit an audit trail of their work process alongside final products.

Draft versions showing evolution of thinking
Revision histories (e.g., Google Docs edit history)
Student-authored memos explaining how peer and instructor feedback was acted upon in revisions
Decision logs explaining choices made
Chat transcripts if AI was used

Example: Deakin University

The Centre for Research in Assessment and Digital Learning at Deakin University developed “reverse scaffolding”: students may only use AI for tasks they have first demonstrated independent capability to perform (Deakin University, n.d.). For example, students must complete an initial essay without AI assistance to demonstrate baseline writing capability, then may use AI for subsequent essays while documenting their process. This preserves validity while allowing progressive skill development.

Key Vocabulary

Process Evidence: Documentation of work evolution and decision-making
Audit Trail: Collection of artifacts showing how the final product was created
Reflective Integration: Explicit articulation of how AI was used and why

return to top

Principle 4: Include Evaluative Judgment as a Core Capability

Foundational Insight

Bearman et al. (2024) argue that as content generation becomes automated, the essential human capability is evaluative judgment: the ability to recognize quality of work produced without, with, and by AI. Graduates need to assess AI outputs for accuracy, appropriateness, bias, and quality rather than accepting algorithmic suggestions uncritically.

This represents a fundamental shift in what universities certify. Previously, we certified students’ ability to produce knowledge artifacts. Increasingly, we must certify their ability to judge the quality of knowledge artifacts regardless of production source. They also need to internalize a sense of accountability for the quality and accuracy of a final product. Of Note: AI is increasingly capable of evaluating the quality of its work. Strategies such as in-person presentations with question and answer may be useful in validating student evaluation skills.

Implications for Practice

Design Critique Assignments Students assess AI-generated work before producing and critiquing their own work:

Provide students with an AI-generated essay/code/analysis
Ask students to evaluate it using course rubrics
Require identification of strengths, weaknesses, errors, hallucinations
Have students improve the work to professional standards
Assign a similar project in which they apply this judgment to work that they produce

This requires deeper engagement and exposes AI limitations (generic thinking, lack of nuance, hallucinations) in ways that build critical AI literacy. As with process-oriented practices, this approach can be difficult to scale.

Example: Oregon State University

The MAGE Framework maps “Distinctive Human Skills” against “GenAI Supplementation” at each level (Zaphir et. al, 2024). They help faculty elevate assessments from lower levels (Remember, Understand) where AI excels to higher levels (Evaluate, Create) requiring human capabilities like ethical reasoning, contextual judgment, and integration of lived experience that AI at present cannot replicate.

Key Vocabulary

Evaluative Judgment: Capacity to recognize quality in one’s own and others’ work
AI Literacy: Understanding of AI capabilities, limitations, responsible use, and appropriate use contexts

return to top

Principle 5: Systematic, Curriculum-Level Assessment Revision is Most Effective

Foundational Insight

Lodge et al. (2023) argue that isolated assignment redesigns are insufficient; programmatic approaches are more robust than one-off course revisions. When AI can complete many traditional tasks, validity threats require coordinated responses across entire programs. Some refer to this as the “Swiss cheese” approach to authentic assessment, because potential vulnerabilities in discrete assessments are staggered over the span of student work in a program (King’s College, n.d.). Australia’s TESQA framework requires higher education providers to submit action plans demonstrating program-level assessment assurance in the AI era (TESQA, n.d.). Georgetown University’s AI Lab, sponsored by the Center for New Designs in Learning and Scholarship, provides voluntary, non-prescriptive program-level support to help administrators and educators explore how AI affects their field, their students, and their curriculum (CNDLS, n.d.).

Effective practice involves program-level assessment mapping that ensures:

Foundational competencies are verified across courses through assured assessment
AI skills are scaffolded progressively from first year to graduation
Learning outcomes requiring independent demonstration are distinguished from those involving tool collaboration
Assessment methods are varied (triangulation) to build comprehensive validity arguments and increase the ability to know how students or don’t use AI across time

Implications for Practice

Map assessments across programs to identify

Which learning outcomes require independent human capability verification (Lane 1)
Which outcomes involve contemporary professional tool use (Lane 2)
Where AI skills are explicitly taught and practiced
Whether assessment diversity provides sufficient validity evidence

Example: Monash University

Monash University’s Programmatic Assessment and AI Review (PAAIR) project is systematically reviewing every course’s assessment across the institution (Monash University, n.d.). The project coordinates program-level redesign ensuring:

AI skills are scaffolded across degree progression
High-stakes assessments in capstone assignments verify core capabilities
Students receive consistent messaging about AI expectations across courses
Faculty collaborate on coherent assessment ecosystems rather than working in isolation

Key Vocabulary

Program Assessment Mapping: Systematic examination of assessment across entire degree
Triangulation: Using multiple assessment methods to build comprehensive validity evidence

return to top

Emerging Questions

Student Anxiety and Wellbeing

Contrary to assumptions that students universally want to minimize effort through AI use, research reveals complex sentiments. Many students actively avoid AI when permitted because they fear it will impair their learning and employability. Others experience anxiety about unclear boundaries and potential accidental violations (Corbin et al., 2025). Ethical concerns about AI’s impact on the environment and the impact of data centers in rural and low-income settings are also on the rise.

Disciplinary Variation

AI impact varies significantly by discipline requiring tailored approaches to assessment:

Computer Science: Moving from “write code” in foundational courses to “debug and review code” as students gain expertise
Humanities and Creative Arts: Rethinking traditional assignments such as essays and prototypes
Health Professions: Renewed emphasis on objective structured clinical examinations (OSCEs)

Cultural and Algorithmic Bias

AI models trained predominantly on Western, English-language data may disadvantage students working on Indigenous knowledge, non-Western contexts, or in languages other than English. Assessment design must explicitly consider whether AI integration could create unfair advantages or disadvantages for particular student populations.

return to top

Conclusion: From Disruption to Intentional Design

AI has disrupted assessment practice, but disruption creates opportunity for intentional redesign. The five principles outlined here provide a foundation:

Transparency through frameworks: Communicate clear, graduated expectations
Validity over detection: Focus resources on measurement quality, not surveillance
Process over product: Make learning visible through documentation and reflection
Evaluative judgment as capability: Assess students’ ability to judge AI output quality
Programmatic coordination: Design coherent assessment ecosystems, not isolated fixes

References

Bearman, M., Tai, J., Dawson, P., Boud, D., & Ajjawi, R. (2024). Developing evaluative judgement for a time of generative artificial intelligence. Assessment & Evaluation in Higher Education, 1–13. https://doi.org/10.1080/02602938.2024.2335321

Bridgeman, A., & Liu, D. (2025, October 2). Two parallel lanes: the roadmap for a future-ready transformative education. Teaching@Sydney. https://educational-innovation.sydney.edu.au/teaching@sydney/two-parallel-lanes-the-roadmap-for-a-future-ready-transformative-education/

Center for New Designs in Learning and Scholarship. (n.d.). CNDLS AI Lab. Georgetown University. https://cndls.georgetown.edu/programs/ai-lab/

Corbin, T., Dawson, P., Nicola-Richmond, K., & Partridge, H. (2025). ‘Where’s the line? It’s an absurd line’: Towards a framework for acceptable uses of AI in assessment. Assessment & Evaluation in Higher Education, 50, https://www.tandfonline.com/doi/full/10.1080/02602938.2025.2456207.

Curtis, G. J. (2025). The two-lane road to hell is paved with good intentions: why an all-or-none approach to generative AI, integrity, and assessment is insupportable. Higher Education Research & Development, 44(8), 2151–2158. https://doi.org/10.1080/07294360.2025.2476516.

Dawson, P., Bearman, M., Dollinger, M., & Boud, D. (2024). Validity matters more than cheating. Assessment & Evaluation in Higher Education. https://doi.org/10.1080/02602938.2024.2388285

Deakin University. (n.d.). Centre for Research in Assessment and Digital Learning. https://www.deakin.edu.au/about-deakin/why-deakin/education-excellence/cradle

Digital Education Council. (2025, July 7). The next era of assessment: A global review of AI in assessment design. Digital Education Council. https://www.digitaleducationcouncil.com/post/the-next-era-of-assessment-a-global-review-of-ai-in-assessment-design

Kickbusch, S., Ashford-Rowe, K., Kemp, A., Specht, M., Bartolic, S., & Lodge, J. M. (2025). Beyond detection: Redesigning authentic assessment in an AI-mediated world. Education Sciences, 15(11), 1537. https://doi.org/10.3390/educsci15111537

King’s College London. (n.d.). Authentic assessment at the module and programme level. https://www.kcl.ac.uk/about/strategy/learning-and-teaching/ai-guidance/approaches-to-assessment/authentic-assessment

Kofinas, A. K., Tsay, C. H., & Pike, D. (2025). The impact of generative AI on academic integrity of authentic assessments in higher education. British Journal of Educational Technology. https://doi.org/10.1111/bjet.13551

Lodge, J. M., Howard, S., Bearman, M., Dawson, P., & Associates. (2023). Assessment reform for the age of artificial intelligence. Tertiary Education Quality and Standards Agency. https://www.teqsa.gov.au/guides-resources/resources/corporate-publications/assessment-reform-age-artificial-intelligence

Mills, A. (2025, October 19). The time to reckon with AI agents in digital learning spaces is now. Anna Mills’ Substack. https://annamills.substack.com/p/the-time-to-reckon-with-ai-agents

Monash University. (n.d.). Programmatic Assessment and AI Review (PAAIR). TeachHQ. https://www.monash.edu/learning-teaching/TeachHQ/Assessment/PAAIR

Perkins, M., Furze, L., Roe, J., & MacVaugh, J. (2024). The AI Assessment Scale (AIAS): A framework for ethical integration of generative AI in educational assessment. Computers and Education: Artificial Intelligence. https://doi.org/10.1016/j.caeai.2024.100259

Popkov, A. A., & Barrett, T. S. (2025). AI vs academia: Experimental study on AI text detectors’ accuracy in behavioral health academic writing. Accountability in Research, 32(7), 1072–1088. https://doi.org/10.1080/08989621.2024.2331757

Shamsi, A., Wang, T., Amraei, M., & Raju, N. V. (2026). Evaluating AI text detection tools for distinguishing human-written from AI-generated abstracts in Persian-language journals of library and information science. Acta Informatica Pragensia, 15(1), 126–134. https://doi.org/10.18267/j.aip.293

Tertiary Education Quality and Standards Agency. (n.d.). Gen AI – TEQSA resources. https://www.teqsa.gov.au/guides-resources/higher-education-good-practice-hub/gen-ai-knowledge-hub/gen-ai-teqsa-resources

Winklemes, M. (2025, October 27). TILTing the use of AI to reduce its risks. The Teaching Professor. https://www.teachingprofessor.com/topics/teaching-strategies/tilting-the-use-of-ai-to-reduce-its-risks

Winkelmes, M. A. (2013). Transparency in teaching: Faculty share data and improve students’ learning. Liberal Education, 99(2), 48-55. https://dgmg81phhvh63.cloudfront.net/content/magazines/Archive/LE_SP13_Vol99No2.pdf

Zaphir, L., Lodge,J, J Lisec, J. McGrath, D., Khosravi H. (2024). How critically can an AI think? A framework for evaluating the quality of thinking of generative artificial intelligence. ArXiv. https://arxiv.org/abs/2406.14769

return to top