[ SELECTED · B2B · AI · A11Y · 2025 ]

A11y Copilot — AI accessibility dashboard for real-world WCAG fixes.

An AI-assisted workflow to find, explain, and fix critical a11y issues fast — with guardrails and concise code suggestions. Detects violations, maps them to WCAG, and emits the smallest possible HTML/CSS diff without breaking IDs, ARIA, or semantics.

→ ROLEProduct/UX Designer (solo builder) · prototyping & front-end in React · LLM prompt & guardrail design.
→ STACKReact · axe-core · LLM-driven fix proposals · WCAG mapping · design tokens.
→ DURATION3–4 weeks · Team of 1 (self).
→ PRINCIPLESPrivacy-first. Adapter-based. Deterministic, safe outputs. CI-ready.
~8×
faster than manual audit→fix
AA+
contrast suggestions, token-aware
~60→8 min
per-issue cycle (typical)
0
IDs / ARIA stripped by guardrails
A11y Copilot — main dashboard
A11y Copilot — issue triage
01 / WHY IT'S USEFUL

Teams struggle to turn audits into fixes.

An accessibility audit usually ends with a 200-page PDF and a Jira backlog of color-contrast tickets. Nothing actually gets fixed. This dashboard closes the loop: detect issues, explain why they matter, and provide copy-ready HTML/CSS with guardrails so engineers can ship faster without breaking semantics or brand.

→ FOR DESIGN

See impact, prioritize, request safe changes.

Score and trend lines surface the issues that affect real users now — not the long tail of WCAG nice-to-haves.

→ FOR ENGINEERING

Copy/paste minimal diffs.

Each issue ships with a tiny HTML/CSS diff. IDs and ARIA preserved. No mass rewrites, no architecture changes.

→ FOR PM

Score & trends for tracking accessibility debt.

One number that moves over time. PM can show progress to leadership without translating WCAG into business language.

02 / AI WORKFLOWS

Four stages: Detect → Reason → Propose → Safeguard.

→ 01 · DETECT

axe-core + heuristics.

Run axe-core for the deterministic baseline. Layer heuristic rules for issues axe doesn't catch (focus order quirks, ARIA misuse patterns, dynamic-content gotchas).

→ 02 · REASON

Map to WCAG.

Each violation is mapped to a specific WCAG criterion and a plain-English explanation of why it matters. The LLM picks the safest minimal change for the violation type.

→ 03 · PROPOSE

Tiny HTML/CSS diff.

The proposal is copy-ready — usually 2–5 lines. No surrounding refactor. No rewrites of the component shell. Just the change the rule requires.

→ 04 · SAFEGUARD

Guardrails on every output.

The LLM is prompt-bound to: never remove IDs, never strip ARIA, never propose a contrast value that isn't AA-compliant, snap to the nearest AA color in the user's token palette.

03 / DESIGN PRINCIPLES

Three constraints that shaped the build.

01

Privacy-first · API/LLM & CI-ready.

The dashboard never assumes a centralized service. Bring your own LLM key. Outputs are deterministic enough to run in CI on every PR — no flaky model-of-the-day. Adapter-based stack lets teams swap providers without rewriting flows.

02

Adapter-based · deterministic · safe outputs.

Inputs (the HTML to audit) and outputs (the diff) are both schema-bound. The LLM is the translator, not the judge — its job is to map a violation to a known repair pattern. The repair pattern is plain code. The guardrails are constants, not prompts.

03

Recruiter-ready · concise, live, impactful.

This case study itself is the proof. Every issue comes with a one-line WCAG rationale, a minimal diff, and a contrast preview. Nothing in the UI is decorative — it's all evidence.

04 / RESPONSIBILITIES

What I built, end-to-end.

→ RESEARCH

WCAG heuristic audit.

Catalogued the most common violation patterns across the projects I'd worked on. Triaged by frequency × fix-cost — the top 12 patterns cover ~80% of issues.

→ IA & FLOWS

Triage → fix flow.

Issues land in a triage queue ranked by severity. One-click to surface the smallest diff. Engineers copy the patch; designers see the contrast preview against design tokens.

→ UI SYSTEM

Tokens & design system.

Built a small token-aware design system so contrast suggestions snap to existing brand colors — engineers never paste a value that breaks the palette.

→ INTEGRATION

axe-core + LLM tests.

axe-core for deterministic detection. LLM call constrained by JSON schema. Every prompt unit-tested with adversarial inputs (broken HTML, missing tags, mixed casing).

05 / VIDEO SHOWCASE

Paste a URL. Get an audit in seconds.

Drop in a URL, the dashboard runs axe-core + heuristics on the live DOM, and a triaged issue list streams in. Click any issue → see the WCAG rationale, the proposed diff, the before/after contrast ratio, and a copy button.

06 / KEY SCREENS

Real visuals from the actual build.

A11y Copilot — issue detail panel
A11y Copilot — diff proposal view
A11y Copilot — contrast preview against design tokens
07 / REFLECTION

What this project taught me.

Accessibility tooling has a credibility problem: it surfaces too much, fixes too little. A11y Copilot is my answer — surface less, propose more, and never break what the engineer already wrote. The guardrails are the product.

The bigger lesson, which carried into my later AI work: LLMs are excellent translators of intent into structured outputs, and they're dangerous judges of business rules. Here, the LLM translates "low contrast" into "snap to nearest AA token". The judgment of which colors are allowed lives in code — not in a prompt.