Why human-in-the-loop AI makes sense for assessment
AI is transforming the way organisations design, deliver and score assessments. It’s helping teams do more with less, without sacrificing quality. For organisations that manage assessments at scale, that’s a genuine leap forward.
Working alongside assessment organisations across education, government and professional certifications has taught us something important: AI works best when human experts stay in the loop. Not because the technology falls short, but because assessment demands high stakes, deep nuance and hard-earned trust.
Let’s look at the key stages where AI is making a real difference, and why the human element remains essential at every one of them.
Smarter question creation, guided by expert judgement
AI assessment tools like Jai can generate question stems at remarkable speed. They can produce plausible distractors, map items to curriculum standards and create multiple variants of an existing question to rapidly build out large item banks. They’re also proving useful for catching potential bias before content ever reaches a candidate.

This adds enormous value when scaling content development, but assessment items carry immense weight. A poorly worded distractor can undermine the validity of an entire test form. An unintended cultural reference can disadvantage a whole cohort. Subject matter experts and psychometricians bring the contextual judgement that AI can’t replicate – they understand how a question will actually land with the people taking the test, and whether it’s genuinely fair, not just statistically sound.
The same applies to psychometric quality assurance. Machine learning models can predict item difficulty before live testing, which reduces costly pre-test cycles. But someone still needs to interpret those predictions and decide whether to retain, revise or retire an item. That person tends to understand the assessment’s purpose and its candidates in a deeper way than AI.
Proctoring that protects without overreaching
AI-driven remote proctoring uses computer vision to monitor exam sessions at scale. It can flag unusual eye movements, detect additional people in the room and identify suspicious activity, all without requiring a live human proctor for every session.
But proctoring sits at the intersection of security and candidate experience, which makes it particularly sensitive. Perfectly innocent behaviour can trigger an AI flag: a candidate might glance at a noisy pet or adjust their screen in a shared household. Without a trained human reviewer interpreting that context, organisations risk false accusations that erode candidate trust.
Human reviewers ensure they assess flagged incidents with empathy. They draw the crucial line between a system that catches cheating and one that respects the people taking the test.
Human review is essential because it ensures the integrity of the online assessment and ensures that legitimate test-takers are not unfairly penalised.
Human exam administrators can verify flagged behaviours by analysing recorded video feeds, screen activity logs, and audio playback to make informed decisions.
The process is therefore a layered approach with an automated AI review followed by the human review and finally, if required candidate communication. The full process ensures a smooth exam experience.
Jenny Erwin, Assessment Delivery Specialist, Janison
Turning data into decisions that matter
AI can synthesise assessment data into powerful dashboards that surface patterns across large datasets – work that would take a team weeks to complete manually. Natural language generation can even produce individualised narrative reports at scale, replacing generic score summaries with meaningful feedback.
But data without interpretation is just numbers. Educators and program managers contextualise those insights. They understand why a cohort underperformed. They know what intervention to recommend. They adjust programs based on the trends AI has revealed.
Data analysis is most powerful when it is coupled with knowing and understanding the humanity behind the numbers, the intersection of data and a story.
Kim Elith, Head of School Assessments & Partnerships, Janison
The goal isn’t to replace that judgement. It’s to arm decision-makers with better information, faster.
Keeping compliance credible
AI gives regulators and credentialing bodies powerful tools for monitoring assessment fairness. It can detect differential item functioning across demographic groups. It can audit scoring consistency. And it can flag anomalies in results data that might point to systemic issues.
But compliance decisions carry significant consequences and determining whether an assessment meets regulatory standards requires human expertise. Deciding whether a flagged disparity reflects genuine bias – or is simply a statistical anomaly – demands institutional knowledge. AI surfaces the evidence; people make the call.
The partnership that gets it right
The conversation around AI in assessment isn’t about choosing between technology and people. It’s about designing workflows where each does what it does best.
AI brings scale. Humans bring context. Together, they power an assessment process that’s not only more efficient, but more trustworthy.
For organisations delivering high-stakes assessments, that combination isn’t just a nice-to-have. It’s the foundation of credibility for your candidates, your stakeholders and the crucial decisions your assessments inform.
Janison
Janison is a leading edtech provider transforming the way assessments are delivered and experienced worldwide.