AI in assessment design

Last updated: 12 May 2026 · Reviewed by Tim Burnett (Admin)

TLDR

AI in assessment design is about using AI to shape assessment tasks so they still measure the intended construct while better reflecting real-world practice. The central question is not whether AI is present, but whether the resulting task still produces valid, authentic, and defensible evidence of learner performance. Stronger sources point towards explicit design choices, clear boundaries around permitted AI use, and more visible process evidence rather than reliance on detection alone. The main unresolved issue is where AI support ends and the assessed performance begins. New sector signals, including educator-growth recognition and national AI literacy initiatives, reinforce that assessment design now sits inside a broader AI ecosystem rather than a narrow classroom question.

Definition

AI in assessment design refers to using AI to shape, adapt, or support assessment tasks so that the task better matches the intended construct, context, and real-world use of the skill. The key issue is whether the assessment still produces valid, authentic, and defensible evidence of learner performance. Prompt engineering, model choice, and governance now matter because they influence how reliably AI can be used in design workflows.

Why It Matters

Assessment teams are increasingly facing a design question as well as a control question: if AI is part of working life, should some assessments allow learners to use it in a structured way? That can improve realism and future relevance, but it can also weaken comparability if the role of AI is not defined clearly. The deeper assessment issue is deciding when AI should be ignored, prohibited, permitted, or deliberately built into the construct. Recognition and national AI programmes matter here because they suggest the wider education system is normalising AI capability faster than many assessment policies are changing. The source set also suggests that process evidence, such as drafts or revision history, may help make learner judgement visible in AI-rich tasks. That matters because the more AI is involved, the more important it becomes to show where the learner’s own contribution sits.

Key Concepts

- **Authentic assessment**: a task that reflects how a skill is used in practice. - **Construct**: the capability the assessment is meant to evidence. - **Permitted AI use**: AI support that is allowed because it does not undermine the assessment purpose. - **AI-integrated task**: an assessment designed so that some AI use is part of the expected performance. - **Process evidence**: drafts, revision history, checkpoints, or other traces that make learner judgement visible. - **Prompt engineering**: shaping prompts so that AI output is more useful, reliable, or constrained. - **Humanity in the loop**: keeping human judgement central in AI-supported educational design and decision-making.

What Experts Agree On

The strongest evidence points towards a common view: AI should be handled as a design issue, not only as a misconduct issue. Across academic, practitioner, and policy-facing material, the recurring theme is that assessment validity depends on being explicit about what AI is doing in the task and why that is acceptable. There is also broad agreement that task design matters more than post-hoc anxiety about AI use. Sources repeatedly point towards clearer boundaries, learner-facing expectations, and evidence of process as more useful than trying to infer authorship after the fact. A further convergence is that human judgement remains central. The more contemporary material does not argue for removing people from assessment decisions; it argues for defining which parts of the workflow can be AI-assisted and which parts must remain human-led.

What Is Contested

What remains unsettled is how far authenticity can be preserved when AI becomes part of the workflow rather than just an external tool. Some designs may become more realistic, while others may drift away from the intended learning outcome. The open question is which parts of the assessment should remain strictly independent and which may legitimately include AI support. Recognition can pull the conversation towards innovation, but it does not itself answer the design question. There is also a live tension between pedagogy, employability, and operational convenience. Those aims are not always aligned, and the evidence base does not yet show a single settled model across sectors. National-level AI rollout adds another layer of uncertainty because it changes learner expectations faster than assessment design can adapt. [source: https://link.s...

Sources

- Sources are currently embedded inline throughout this page and need consolidation in the next editorial pass.

Risks

- Assessments may become more realistic but less comparable across learners or centres. - AI may be treated as part of the task without clear rules on acceptable use. - Stakeholders may disagree about whether the design still evidences the intended construct. - Authenticity claims may be weakened if task design changes faster than validation. - Suppliers or institutions may frame redesign as innovation without showing how validity is protected. - Learners may be asked to use AI without enough curriculum preparation to do so responsibly. - Human judgement may be rhetorically retained but practically diluted if AI becomes too central to design or review. - National or award-linked AI enthusiasm may outpace local assessment governance.

Good Practice

A sensible approach is to begin with the construct and work backwards: 1. Define what the learner must do unaided. 2. Identify where AI support could be allowed without changing the meaning of the result. 3. Decide what evidence will show that the redesigned task still supports the intended decision. 4. Set out the rules for learners, staff, and external stakeholders in plain language. 5. Check whether the assessment is testing subject knowledge, professional judgement, prompt engineering, or responsible AI use. 6. Revisit the design if the wider education environment changes, for example through new national AI programmes or reward structures that normalise AI use. Where AI is built into the task, the role of AI should be explicit rather than implicit. That makes it easier to defend the assessment, explain the rules, and review whether the evidence still supports the intended decision.

Options or Comparison

### Common design stances | Option | What it means | Main benefit | Main risk | |---|---|---|---| | Prohibit AI | Learners complete the task without AI support | Stronger independence and simpler attribution | Can feel unrealistic where AI use is normal in practice | | Permit AI | AI is allowed but not central to the construct | Better reflects real-world working conditions | Rules must be very clear or comparability may suffer | | Integrate AI | AI use is deliberately part of what is being assessed | Aligns assessment with current practice or job tasks | Harder to protect validity if the construct is blurred | The main decision is not which option sounds modern, but which option best matches the intended evidence claim.

Example in Practice

A professional writing assignment asks learners to draft a client briefing with AI allowed for initial structure and wording, but requires a version history and a short reflection explaining key decisions. In that setup, the assessment is not just judging the final text; it is also checking whether the learner can make sound decisions about AI use and improve the output responsibly. This kind of design can work if the organisation is clear about what counts as independent judgement.

Key Sources

- Cambridge Assessment Network workshop on authentic assessment in the age of AI. - LSE Public Policy Review article on generative AI in higher education. - Anthropic Education Report on university student use of Claude. - Stanford Digital Education keynote on humanity in the loop. - Educational Duct Tape episode on AI detection tools and writing-process evidence. - The Learning Awards entry page. - Estonia AI Leap programme press release.

Vendor Landscape

The visible market story here is less about standalone products and more about suppliers framing AI as part of design, pedagogy, and workflow support. That is a useful market signal, but it does not by itself settle the validity question. Vendor material is best read as evidence of direction of travel, not as independent validation of assessment quality. Award and recognition structures can also encourage suppliers to present design innovation as proof of effectiveness, which readers should resist.

FAQs

### What is AI in assessment design? It is the use of AI as part of how an assessment is shaped, adapted, or structured so that the task better matches the intended capability being assessed. ### Why does it matter for exams or coursework? Because some tasks may need AI to reflect real practice, while others must stay independent to remain valid. The key issue is whether the assessment still supports the decision being made. ### Does using AI in a task automatically weaken authenticity? No. The more important question is whether AI use is explicitly designed into the task and whether the resulting evidence still supports the construct. ### What should assessment teams ask before redesigning a task? They should ask what the learner must do unaided, what AI use is acceptable, and how the organisation will justify the design to stakeholders.

Last Reviewed By

Tim Burnett (Admin)

Suggested Citation

Test Community Network. "AI in assessment design." TCN AI & Assessment Wiki. Last reviewed 2026-05-02. https://www.testcommunity.network/wiki/ai-in-assessment/ai-in-assessment-design.html

Sources

← Back to Artificial Intelligence (AI) in Assessment