Annex III §4
EU AI Act compliance for HR tech
If your platform decides which candidates surface, which performance reviews escalate, or which interview questions are generated, you are inside Annex III §4. The 2 August 2026 deadline is real — but the more pressing one is your enterprise buyers, who are already adding AI Act questions to RFPs. We help HR-tech providers translate what already lives in their ML stack into a defensible Annex IV technical file.
HR is the original use case the Commission flagged when drafting the AI Act, and it is the use case that gets the most regulator attention. The recital language is unusually direct: AI used in employment, workers management, and access to self-employment shall be considered high-risk when it is used to recruit, select, screen, evaluate, promote, or terminate. That is most of the surface area of a modern HR-tech product. Profiling kicks in the moment a system ranks one candidate above another, and once profiling is in scope, Article 22 GDPR sits underneath the AI Act, not next to it. Two regulators, one technical file.
Annex III §4 covers four areas: (a) recruitment and selection, including targeted job advertising, application filtering, and CV ranking; (b) decisions affecting work relationships, including promotion, termination, and task allocation; (c) workers management, including monitoring and behaviour-based evaluation; (d) access to self-employment platforms. The Article 6(3) exemption — narrow procedural tasks, preparatory tasks, pattern detection without human-replacing inference — almost never applies to HR products, because the system's output is the substantive decision input. We have yet to see a candidate-matching engine that survives the Article 6(3) test on close reading.
Buyer-side questions follow a predictable pattern. Demographic slice metrics across protected categories. Datasheet for the training set, including consent and provenance. Override rate — how often human reviewers disagree with the model's ranking. Reasoned explanation at the candidate level (Article 13 transparency). Procurement is also asking for evidence that workers and applicants can challenge decisions — the Art. 26(7) notice requirement, plus a route to human review. None of these are policy questions; they are evidence questions, and the evidence already exists in your evaluation pipeline if you know where to look.
The most common gap we find is fairness metrics that exist at training time but not in production. Subgroup AUC on a held-out set proves the team thought about it; it does not prove the deployed model behaves the same way two months later. The fix is a slice-level monitor on actual inference traffic — fairlearn or evalml feeding into your existing observability stack. The second-most-common gap is the continuous-learning record: HR products tend to retrain on candidate-funnel feedback, and Annex IV §1(f) requires you to declare the predetermined boundaries of that learning. A change-control log on the training pipeline closes that gap in a sprint.
For HR-tech, the artefacts Annex IV wants live across three systems. Model performance and per-slice metrics live in your experiment tracker — MLflow runs, W&B reports, or whatever you use to compare candidate-matching models. Drift, override rates, and live fairness metrics live in your monitoring stack — typically Langfuse or LangSmith if you are LLM-based, or a custom Prometheus exporter if you are not. Decision logs and reason codes live in your application database. The Sprint produces a mapping document that lists, for each Annex IV requirement, which existing system holds the evidence and what export format the technical file needs.
Diagnostic for HR-tech providers
We work with two clients at a time. The Diagnostic confirms Annex III §4 classification across up to three systems in a week, including a one-page gap snapshot you can share with your enterprise buyers. Engineering-led, fixed price, no questionnaire-only deliverables.