Anonymised reference engagement
Annex III Classification & Annex IV Gap Memo
This is an anonymised sample. Names, model architectures, scale figures, and dates have been altered to remove identifying information from a real engagement. Structure, methodology, and the analysis itself are representative of the work product delivered to clients.
Section 1
Executive summary
Bottom line
- All three of TalentFit AI's production systems — candidate matching, screening summarisation, structured interview scoring — are classified as high-risk AI under Annex III §4(a) of the EU AI Act.
- The Article 6(3) low-risk exemption is not available for any of the three. The screening LLM has a superficial case for §6(3)(a) (narrow procedural task) but is defeated by the downstream coupling to the matcher.
- Of the eighteen Annex IV requirements analysed, eleven are satisfied or near-satisfied by artefacts that already live in the engineering stack. Seven require new documentation work.
- Three material gaps drive the remediation effort: no Article 72 post-market monitoring plan; no documented Article 9 risk-management system; subgroup metrics computed in the training pipeline but not exported per shipped system.
- Procurement exposure is real today. Two of TalentFit AI's named target enterprise accounts have already added an AI Act due-diligence section to their vendor security review.
Recommendation: a two-week Readiness Sprint produces version one of the technical file from existing artefacts, plus the Article 9 RMS template and a 26-week roadmap to operational alignment before 2 August 2026. Estimated engagement value: €9,500.
Section 2
Annex III classification per system
Each system in scope is assessed against Annex III §4 (employment, workers management, and access to self-employment) and tested against the four exemptions in Article 6(3).
Candidate Matching (System 1)
HIGH-RISKScreening Summarisation (System 2)
HIGH-RISKStructured Interview Scoring (System 3)
HIGH-RISKSection 3
Annex IV gap analysis
Eight representative requirements from Annex IV §1, §2 and the operating Articles, mapped against TalentFit AI's existing engineering artefacts. The full mapping covers thirty rows and is delivered as part of the Readiness Sprint.
| Reference | Requirement | Status | Severity | Evidence / gap note |
|---|---|---|---|---|
| Annex IV §1(a) | System identifier and version | ✓ Satisfied | — | MLflow experiment_id plus git SHA per shipped model. No work required. |
| Annex IV §1(c) | Description of how the system interacts with hardware and software | ◐ Partial | MEDIUM | Service architecture diagrams exist for each system in the engineering wiki; no Annex-IV-shaped interaction document. One day of writing closes the gap. |
| Annex IV §1(g) | Validation and testing logs, signed and dated | ◐ Partial | MEDIUM | MLflow run history and W&B reports are comprehensive; signing-and-dating discipline is missing. Resolved by a release-gate template that signs and dates the run on tag. |
| Annex IV §1(h) | Validation procedures, metrics by demographic subgroup | ◐ Partial | HIGH | Subgroup metrics are computed in the training pipeline (fairlearn) but are not exported per shipped system. Material remediation: build an export job that writes per-system subgroup tables on every release. |
| Annex IV §2(b) | Design choices, assumptions, rationale | ✕ Missing | HIGH | No model card or design-decision log exists for any of the three systems. New-build work: one model card per system, ~1 day of writing each from existing engineering memory. |
| Annex IV §2(g) | Test logs validating performance on representative inputs | ◐ Partial | MEDIUM | Test logs exist in W&B; not signed-and-dated as Annex IV requires. Same release-gate template fix as §1(g). |
| Annex IV §9 / Art. 72 | Post-market monitoring plan | ✕ Missing | CRITICAL | No PMM plan exists. Drift monitoring is ad-hoc and not connected to subgroup metrics. Largest single piece of new work in the engagement. |
| Article 73 | Serious incident reporting workflow (15 / 10 / 2-day clocks) | ◐ Partial | HIGH | On-call rota and incident process exist; no detection-to-report SLA defined for AI-Act-classified incidents. Eight hours of process design closes the gap. |
Status
- ✓ Satisfied — existing artefact meets the requirement.
- ◐ Partial — artefact exists; shape, signing, or scope misses the requirement.
- ✕ Missing — no artefact today; new build required.
Severity
- CRITICAL Critical — blocks the technical file; remediate first.
- HIGH High — material gap; remediate within the Sprint window.
- MEDIUM Medium — closeable within four hours of work.
- LOW Low — cosmetic; close at the next release.
Section 4
Article 9 risk register — initial seed
The Article 9 risk-management system is a documented, lifecycle-long process. The five rows below are the seed risks identified during the Diagnostic; the full register is built out during the Sprint and maintained as part of the engagement.
| Risk | Likelihood | Severity | Mitigation |
|---|---|---|---|
| Subgroup performance drift in the matcher post-deployment | MEDIUM | HIGH | Quarterly subgroup metric review; alerting thresholds wired to the on-call rota. |
| Hallucinated CV signals from the screening LLM (fabricated employers, mis-extracted dates) | LOW | MEDIUM | Output schema validation; LLM-judge consistency check on a 1% rolling sample; recruiter spot-check UI affordance. |
| Bias amplification across the matcher → screener → interview-scoring pipeline | MEDIUM | HIGH | Per-stage slice tests; coupling-aware monitoring; quarterly end-to-end fairness evaluation on a held-out cohort. |
| Re-purposing of system outputs for hiring decisions outside the documented scope | LOW | HIGH | Contractual scope clause in customer terms; UI affordance constraints; quarterly customer-side compliance attestation. |
| Data drift after a customer integrates a new ATS | MEDIUM | MEDIUM | Rolling evaluation set per customer; retraining cadence defined; per-customer onboarding gate. |
Section 5
Article 72 post-market monitoring plan — outline
The PMM plan is the largest single piece of new documentation work. Outline below; full plan delivered as a Sprint artefact.
- Telemetry collected per inference
- Input distribution snapshot (categorical and numerical features); prediction confidence; downstream action observed (clicked, contacted, interview-requested, hired); customer identifier and system version.
- Subgroup tracking
- Per-system, per-30-days, per-protected-attribute (where data is available and lawful to track). Results written to a versioned subgroup-metrics dashboard and exported to the technical file appendix on the quarterly review.
- Drift thresholds
- Population shift threshold: ten per cent KL-divergence on the input feature distribution triggers a review. Subgroup performance gap threshold: five per cent absolute gap on any tracked subgroup triggers escalation.
- Reporting cadence
- Monthly internal review by the engineering lead. Quarterly written report co-signed by the engineering lead and the Partner. Annual external review (optional, recommended at Programme tier).
- Linkage to Article 73
- Any PMM-detected event meeting the Article 3(49) serious-incident definition triggers the incident playbook (Section 6) and starts the 15 / 10 / 2-day clocks.
Section 6
Article 73 incident reporting playbook
Article 73 imposes the 15 / 10 / 2-day reporting clocks. The playbook converts on-call detection into a regulator-ready report.
- Trigger conditions
- Article 3(49) serious-incident definition operationalised against the three systems. Examples: a hiring outcome materially altered by a model failure; a subgroup performance breach beyond mitigation; a data-leak incident exposing CV content.
- Time-to-classify SLA
- T+24 hours after on-call detection: an engineering-and-Partner classification call decides whether the event is an Article 73 serious incident.
- Reporting clocks
- 15 days for general serious incidents; 10 days where serious harm occurred or is likely; 2 days for widespread infringement of Union law. Templates for each clock are part of the deliverable.
- Escalation chain
- On-call engineer → engineering lead → Partner → external counsel (if required) → competent national authority report.
Section 7
26-week remediation roadmap
Calendar to 2 August 2026 enforcement of Annex III high-risk obligations. Owners shown are illustrative role labels; named owners are agreed at Sprint kickoff.
| Weeks | Workstream | Owner | Deliverable |
|---|---|---|---|
| 1–4 | Documentation gap closure | Engineering Lead + Partner | Three model cards · signed validation logs (release-gate template) · design-rationale log · §1(c) interaction document. |
| 5–10 | Post-market monitoring plan operational | Engineering Lead | PMM plan v1.0 · subgroup metric exports per system · drift alarms wired to on-call · monthly review cadence in production. |
| 11–16 | Article 9 risk-management system documented | Partner + Compliance | Documented RMS · risk register synchronised with PMM alarms · quarterly review process. |
| 17–22 | Article 73 incident reporting workflow | On-call Lead + Partner | Trigger detection rules · classification SLA wired into on-call · reporting templates for each of the 15 / 10 / 2-day clocks. |
| 23–26 | Annex IV technical file v1.0 | Partner | Unified Annex IV technical file with signature pages · hand-off pack · readiness assessment letter for procurement use. |
Section 8
Engagement next step
TalentFit AI is in scope for high-risk obligations on all three production systems. The artefacts already in the engineering stack cover roughly two-thirds of Annex IV; the remaining third is concentrated in three workstreams (post-market monitoring, model cards, subgroup-metric exports) that a two-week Sprint addresses end-to-end. The recommended path is a Readiness Sprint kicking off within ten working days of acceptance, followed by the 26-week remediation roadmap above.
Chiekh Alloul, Partner · Tenth Partner · 6 May 2026