WCAG Compliance · Published 2026-06-20
Audio description for L&D training video: WCAG 1.2.3, 1.2.5, and the compliance layer after captions
Most L&D teams completing caption compliance projects reach the end of the caption phase and feel they have satisfied their WCAG 2.1 Level AA obligations for video content. They have captioned their LMS library, verified 99% accuracy against the DCMP protocol, built a vendor relationship and a caption workflow, and updated the accessibility policy to reflect the programme. What the auditor tells them next — often for the first time — is that WCAG 2.1 Level AA contains a second video accessibility requirement they have not yet addressed: WCAG Success Criterion 1.2.5, Audio Description Prerecorded. Captions are SC 1.2.2. Audio description is SC 1.2.5. Both are Level AA. The caption project addressed SC 1.2.2. SC 1.2.5 is still open.
Audio description (AD) is a narration track added to video content that describes visual information not otherwise conveyed by the existing audio: what the instructor is demonstrating with their hands, what the diagram on screen contains, what the safety procedure steps look like, what text is displayed on the slide, what the interface shows when the software cursor moves to the menu. For a learner who is blind or has low vision, the audio track of a training video is the only channel available. If the original narration says "and here you can see the difference between the two configurations" while the video cuts to a split-screen comparison table — and then moves on — the blind learner heard the phrase but received none of the instructional content it referenced. The video has failed that learner, regardless of whether captions are present.
The scale of the gap in L&D programmes is significant. Of the 53 posts in the GlossCap blog corpus, none addresses audio description — and this mirrors the state of the L&D captioning market. Every captioning vendor, every compliance guide, every LMS accessibility page focuses on captions for deaf and hard-of-hearing learners (WCAG SC 1.2.2 and SC 1.2.4). Audio description for blind and low-vision learners (SC 1.2.3 and SC 1.2.5) is the corresponding requirement on the visual side, and it receives a fraction of the attention. This is beginning to change. ADA Title II enforcement, which began April 2026, references WCAG 2.1 Level AA for public university and state/local government training content — and WCAG 2.1 Level AA includes SC 1.2.5. Section 508 enforcement for federal government and federal contractor training content has included SC 1.2.5 since the 2017 refresh. L&D teams are now encountering accessibility audits that include SC 1.2.5 findings for the first time.
This guide covers what audio description is and how it differs from captions, the three WCAG video description criteria (SC 1.2.3, 1.2.5, and 1.2.7) and what each requires in practice, who carries an AD compliance obligation under ADA, ADA Title II, Section 508, and the European Accessibility Act, a prioritization framework for L&D training content, the complete AD production workflow from content audit through delivery, AD provider options for L&D teams, platform-by-platform LMS delivery for the major LMS platforms including Kaltura, Panopto, Brightcove, Cornerstone, TalentLMS, Workday Learning, and eLearning authoring tools, how to integrate AD production into an existing caption programme, eight AD-specific failure modes, and a seven-question FAQ.
TL;DR — six things every L&D accessibility coordinator needs to know about audio description
- Audio description is WCAG 2.1 Level AA — the same standard as captions. WCAG SC 1.2.5 (Audio Description Prerecorded) is a Level AA criterion. If your ADA, Section 508, or EAA compliance obligation is WCAG 2.1 Level AA — which it is for virtually all L&D teams with a compliance obligation — SC 1.2.5 applies to all pre-recorded training video alongside SC 1.2.2 (captions). A caption programme alone does not satisfy the full WCAG 2.1 Level AA video accessibility requirement.
- Audio description describes visual information not conveyed in the audio track. Captions make audio content accessible to deaf and hard-of-hearing learners. Audio description makes visual content accessible to blind and low-vision learners. They address different access needs and different WCAG criteria. The learner who is blind can hear the original narration fine — but cannot see the diagram, the demonstration, the on-screen text, or the comparison table. AD describes those elements so that the audio channel carries the full instructional content of the video.
- Not all training video carries the same AD urgency. The highest-priority content is video where significant instructional information is conveyed exclusively through visual means: procedure demonstrations, safety training videos, equipment operation training, software interface walkthroughs, and any video that displays on-screen text, diagrams, or comparison tables not read aloud. Pure dialogue-and-narration video — talking-head interview, audio lecture over static title slide — has low visual information density and may not require AD if the audio track already conveys all instructional content.
- Standard AD inserts narration in natural pauses; extended AD pauses the video. Standard audio description (SC 1.2.5) works by inserting the description narration into gaps in the existing audio. When a training video leaves insufficient natural pauses for description, extended audio description (SC 1.2.7) pauses the video to allow a longer description, then resumes. Extended AD requires post-production video editing or player support and is significantly more complex. Most professionally produced training video has adequate natural pauses for standard AD; densely narrated eLearning modules often do not.
- AD production costs 3–5× caption production for the same content. Caption production at a standard rate is approximately $1.50–$4.00 per finished video minute. Audio description for the same content runs $8–$20 per finished video minute, reflecting the AD script writing, voice talent recording, and audio mixing steps not present in caption production. Integrating AD into an existing caption job at the same vendor reduces the marginal cost. Budget planning for an AD programme should use the caption programme budget framework with an AD-specific multiplier for the highest-priority content tiers.
- LMS platform support for AD delivery varies widely. Kaltura and Panopto have native support for alternate audio tracks, making them the most AD-compatible LMS video platforms. Brightcove supports AD audio tracks via API. Cornerstone OnDemand, TalentLMS, and Workday Learning have limited or no native multi-track audio support — the workaround is a separate AD-enabled video version or external player embedding. eLearning authoring tools (Storyline, Rise, Captivate) require a separate AD-enabled SCORM package. The LMS delivery plan must be established before committing to an AD production format.
What audio description is — and how it differs from captions
Captions and audio description are the two primary video accessibility accommodations required by WCAG 2.1 Level AA, and they address different access needs on opposite sides of the sensory divide. Captions make the audio content of a video accessible to learners who cannot hear it — deaf and hard-of-hearing learners read what is spoken, describing on-screen sounds, speaker identification, and non-speech audio cues. Audio description makes the visual content of a video accessible to learners who cannot see it — blind and low-vision learners hear a description of what is shown on screen, inserted as narration into the video audio channel.
The confusion between the two stems partly from terminology overlap and partly from the historical accident that captioning compliance became widely discussed before audio description compliance in the L&D market. Audio description is sometimes called video description, descriptive video, visual description, or described video. The WCAG standard uses "audio description." The Described and Captioned Media Program (DCMP), the federal educational media accessibility resource, uses "audio description" for produced description narration tracks and "described video" for the final accessible media. The term you will encounter most often in vendor pricing sheets is "audio description" or simply "AD" — and that is the term used throughout this guide.
The structural difference between captions and audio description in production terms:
- Captions start with the audio track: a transcript of what is spoken is created, time-coded to the video, formatted in accordance with WCAG caption requirements (caption blocks, speaker identification, non-speech audio cues), and delivered as a sidecar file (.srt, .vtt, .ttml) or burned into the video. The raw material is the audio. The output is a text track synced to the audio.
- Audio description starts with the video track: the visual content is reviewed for information that is not conveyed by the existing audio, a timed script of descriptions is written, a voice talent records the narration, and the AD audio is mixed as a second audio track alongside the original. The raw material is the video. The output is an alternate audio track synced to the video.
In practice, captions and audio description are produced from the same source video and are often ordered together from the same vendor. The natural unit of accessible video content is a video with both a caption track (for deaf and hard-of-hearing learners) and an audio description track (for blind and low-vision learners) — plus a transcript that serves as a text alternative for both access needs. Producing them together at initial captioning is more efficient than retrofitting AD to a video that already has captions.
What audio description describes — and what it does not
AD script writing follows conventions developed by the DCMP (Described and Captioned Media Program) and the ACB Audio Description Project that distinguish useful description from unhelpful over-description:
Audio description should describe:
- All on-screen text that is not read aloud — slide titles, bullet points, labels, annotations, on-screen instructions, interface labels, form field names, error messages
- All visual demonstrations — physical technique, hand position, equipment operation, procedure steps, assembly sequences, safety procedure demonstrations
- All diagrams, charts, graphs, and comparison tables — including their structure, data, labels, and what conclusion or comparison they are intended to illustrate
- Significant visual transitions — a cut to a new scene, a change in presenter, a shift to a new diagram or interface view that the narration does not announce
- Visual information that modifies or contextualises the spoken content — a speaker's face expressing emphasis, a physical object whose properties are being discussed, a setting whose characteristics are relevant to the instructional content
Audio description should not describe:
- Visual information that is redundant with what is spoken — if the narrator says "now I am clicking the File menu" while the cursor moves to the File menu, the AD does not need to describe the cursor movement
- Aesthetic details without instructional relevance — the colour of a presenter's shirt, the décor of the room, decorative visual elements that carry no instructional meaning
- Interpretive or judgmental descriptions — AD describes what is seen, not what it means or whether it is good. The DCMP standard: "describe what is seen, not what it means"
- What is about to happen — AD describes the present visual, not predicts the next action (unless this is required for context)
The practical test for any visual element: if a blind learner's understanding of the instructional content would be incomplete without a description of this element, it requires AD. If the audio track already conveys the same information that the visual element conveys, AD for that element is optional.
WCAG SC 1.2.3, 1.2.5, and 1.2.7 — what each requires
WCAG 2.1 includes three success criteria that address audio description for pre-recorded video. Understanding the distinction between them is essential for L&D compliance planning because they sit at different conformance levels and impose different production requirements.
SC 1.2.3 — Audio Description or Media Alternative (Level A)
SC 1.2.3 requires that pre-recorded synchronized media (video with audio — which describes virtually all training video) provides either:
- An audio description track, OR
- A full text alternative — a document that presents all the information conveyed in the video, including descriptions of all visual content, in text form
Level A is the baseline conformance level. Every WCAG-conformant implementation satisfies Level A. The "media alternative" option at SC 1.2.3 means that a comprehensive transcript — not just a speech transcript, but a full description of all visual content integrated into the text — satisfies this criterion. If the L&D team has been producing full transcripts that include visual descriptions alongside the caption project, those transcripts satisfy SC 1.2.3.
The critical limitation of SC 1.2.3 as a compliance strategy: it only satisfies Level A. WCAG 2.1 Level AA — which is the compliance standard for ADA, Section 508 (post-2017 refresh), and EAA — requires satisfying all Level A and Level AA criteria. SC 1.2.5 is Level AA. Satisfying only SC 1.2.3 via text alternative leaves SC 1.2.5 unmet at Level AA.
SC 1.2.5 — Audio Description Prerecorded (Level AA)
SC 1.2.5 requires that all pre-recorded synchronized media provides an audio description. The "or media alternative" escape hatch from SC 1.2.3 is not available at this criterion. A full text transcript does not satisfy SC 1.2.5. A dedicated audio description track — a second audio track in the video player that a blind learner can enable — is required.
This is the criterion that L&D teams are most commonly missing when an accessibility audit reviews their video library. The typical finding in an SC 1.2.5 audit: "Pre-recorded video content does not provide audio description. Text transcripts are available for some content but do not satisfy SC 1.2.5 at Level AA. Audio description is required for all pre-recorded synchronized media." The audit finding is correct, and no amount of transcript availability resolves it — the AD track itself is required.
SC 1.2.5 applies to all pre-recorded synchronized media unless: all of the information presented in the video is also presented in the existing audio track. The WCAG note on this exception: "If all the information in the video track is already provided in the audio track, no audio description is necessary." This exception is narrow — it applies to video where the narration explicitly reads every piece of information shown visually, leaving nothing to visual-only conveyance. Most training video does not meet this test. The WCAG Understanding document for SC 1.2.5 clarifies: narrated slide presentations where the narrator reads every word of every slide, and talks through every visual element, may meet this exception. A procedure demonstration video where the narrator says "and here we apply the tourniquet" while showing the physical technique does not.
SC 1.2.7 — Extended Audio Description (Level AA — optional enhancement)
SC 1.2.7 addresses the scenario where a video does not have sufficient natural pauses in the audio track to insert AD narration for all of the visual information that needs description. Standard AD (SC 1.2.5) inserts narration into existing pauses; extended AD pauses the video itself — freezing playback while the AD narrator reads the description, then resuming from the point where it paused.
SC 1.2.7 is also Level AA, but it is not automatically required — it applies only "where pauses in foreground audio are insufficient to allow audio descriptions to convey the sense of the video." Most professionally produced training video has adequate natural pauses. The content types where extended AD is most likely to be needed are densely narrated eLearning modules with continuous voiceover over animated slides — content that leaves no gaps between narration sentences for an AD insert.
Extended AD is significantly more complex to produce and deliver than standard AD. It requires either post-production video editing to insert pause frames (the video is re-rendered with freeze frames at AD insertion points, making the video longer and requiring re-upload to all delivery platforms) or a video player that implements the pause/resume mechanism at the player level (limited LMS player support as of 2026). For most L&D programmes, the practical approach to content that cannot accommodate standard AD is to re-cut the narration at the authoring stage to build in natural pauses — more feasible for new content than for existing content that would require re-recording.
The compliance matrix summary
| WCAG Criterion | Level | Requirement | Satisfied by transcript? |
|---|---|---|---|
| SC 1.2.3 | A | AD track OR full text alternative | Yes — comprehensive transcript satisfies SC 1.2.3 |
| SC 1.2.5 | AA | AD track required | No — transcript does not satisfy SC 1.2.5 |
| SC 1.2.7 | AA | Extended AD when pauses insufficient | No — only applies when SC 1.2.5 standard AD cannot accommodate all content |
The practical compliance implication: a WCAG 2.1 Level AA-compliant training video library requires both caption tracks (SC 1.2.2) and audio description tracks (SC 1.2.5). Full transcripts satisfy SC 1.2.3 but leave SC 1.2.5 open. The most common audit finding is that a caption programme has been built, captions satisfy SC 1.2.2, some transcripts satisfy SC 1.2.3, but no audio description tracks exist — leaving SC 1.2.5 unmet for all pre-recorded video content.
The auto-captions compliance analysis and the US caption compliance matrix cover the full WCAG criteria map for video content, including the Level A and Level AA distinction across SC 1.2.2, 1.2.3, 1.2.4, and 1.2.5 in the context of captions and media alternatives.
Who carries an AD compliance obligation
ADA Title I — private employers with 15+ employees
ADA Title I's reasonable accommodation obligation applies to all terms, conditions, and privileges of employment — including training. An employee with a vision disability who requests access to training content has a right to an effective reasonable accommodation. For pre-recorded training video, audio description is the standard accessible format that satisfies the ADA Title I reasonable accommodation obligation for blind and low-vision employees.
The ADA's reasonable accommodation framework is request-triggered: the formal obligation to provide a specific accommodation arises when an employee discloses a disability and requests an accommodation. However, ADA employment law practitioners increasingly advise proactive AD production for training content rather than reactive production on accommodation request, for the same operational reasons that apply to caption production: a new employee with a vision disability may need the training content before an accommodation process has been completed; rush AD production at 2–3× standard rates is more expensive than scheduled production; and maintaining a training library that is structurally inaccessible to blind employees creates evidence of a systemic barrier under the EEOC's systemic enforcement framework regardless of individual accommodation requests.
ADA Title II — public universities, state and local government entities
ADA Title II regulations effective April 24, 2026 require WCAG 2.1 Level AA compliance for web and digital content from covered public entities — state and local governments, public universities, and entities receiving federal financial assistance. WCAG 2.1 Level AA includes SC 1.2.5. Training video published on university learning management systems, state agency training portals, and public university course platforms is subject to SC 1.2.5 under the April 2026 ADA Title II regulations. The university lecture capture guide covers the ADA Title II compliance requirements for university lecture recordings; SC 1.2.5 applies to those recordings alongside SC 1.2.2.
Section 508 — federal government and federal contractors
The Revised Section 508 Standards (effective January 2018) incorporate WCAG 2.0 Level AA by reference for electronic content. WCAG 2.0 Level AA includes SC 1.2.5. Federal government agencies and federal contractors who develop, procure, or use training video must ensure that video meets SC 1.2.5 — audio description is required for all pre-recorded synchronized media. Section 508 has carried the SC 1.2.5 requirement since 2018, making federal government and federal contractor L&D teams among the earliest to receive SC 1.2.5 audit findings.
The full Section 508 captions guide covers the Section 508 compliance requirements for training video; the same framework applies to SC 1.2.5 for audio description, with the same enforcement context (Section 508 complaints filed with the relevant federal agency, DOJ enforcement, private litigation under the Rehabilitation Act for federally funded entities).
European Accessibility Act — EU market participants
The EAA, enforceable since June 2025, references EN 301 549 as the technical standard. EN 301 549 incorporates WCAG 2.1 Level AA. SC 1.2.5 is a WCAG 2.1 Level AA criterion. L&D teams at organisations operating in the EU market — including US-headquartered organisations with EU employees — carry an SC 1.2.5 obligation under the EAA for training content delivered to EU employees.
State laws — California, New York, Washington, Illinois
Several state accessibility laws either explicitly incorporate WCAG 2.1 Level AA (including SC 1.2.5) or extend accessibility obligations beyond federal ADA to organisations below the federal thresholds. California Unruh Civil Rights Act, California GovCode §§ 7405/11135, New York Human Rights Law, Washington Law Against Discrimination, and Illinois Human Rights Act all create accessibility obligations that reach SC 1.2.5 for in-scope content. The compliance status analysis maps these state law obligations in the context of video accessibility.
Prioritization framework: which training content needs AD first
A large L&D video library cannot be converted to full WCAG 2.1 Level AA compliance — captions and audio description — in a single project. The same prioritization logic that governs caption backlog remediation applies to audio description: identify the highest-risk, highest-impact content first and work from there. The AD prioritization framework differs from the caption prioritization framework in one important respect: AD priority is driven by the information density of the visual track, while caption priority is driven by the compliance risk of the content category.
Tier 1 — AD required regardless of accommodation request status
Safety procedure and compliance training with visual procedure content. Videos showing physical safety procedures — lockout/tagout, hazardous chemical handling, emergency evacuation procedures, PPE donning and doffing sequences, fire suppression operation — contain instructional content that is conveyed primarily through the visual demonstration. The audio track of a lockout/tagout training video may say "now de-energise the equipment and apply the lockout device" while showing the specific hand position, device application, and physical verification sequence. A blind employee who cannot access the AD for this content cannot perform the procedure correctly. This is both the highest compliance-risk content (OSHA/NFPA/Joint Commission safety training obligations) and the highest harm-to-learner content (procedure errors in safety training have direct safety consequences).
Equipment operation and physical procedure training. Any training where the learner is expected to perform a physical task after watching the video — equipment operation, lab technique, clinical procedure, assembly sequence — depends on the learner being able to observe and replicate the demonstrated technique. Without AD, a blind or low-vision learner watching equipment operation training receives the narration but not the visual content of what they are expected to do. The AD for this content is not an accessibility accommodation — it is the instructional delivery mechanism for a learner whose primary access channel is audio.
Software interface training with on-screen navigation content. Software walkthroughs, LMS navigation training, enterprise application onboarding videos, and system configuration demonstrations typically have narration that references the visual interface without describing it: "click the Settings gear in the top-right corner" — but without AD, the blind learner cannot see where the Settings gear is relative to the rest of the interface. Interface navigation training without AD is not merely inaccessible — it is pedagogically non-functional for a blind learner. The AD script for software interface training is the most detailed of any content type: every cursor movement, every interface element, every dropdown, every form field must be described.
Visual-primary diagram and comparison content. Training videos that present their instructional content primarily through on-screen diagrams, comparison tables, flow charts, organisational charts, and data visualisations — with narration that refers to but does not read the visual — require AD for all of that visual information. A 30-second sequence showing a comparison matrix of three software pricing tiers, with the narrator saying "as you can see, the Enterprise tier offers the most features at this price point," is entirely inaccessible to a blind learner without AD that describes the comparison matrix contents.
Tier 2 — AD recommended for ADA Title II and Section 508 compliance
Product and service training with demonstration content. Internal product training videos that demonstrate features, walk through workflows, or show customer-facing interfaces typically contain moderate-to-high visual information density. The product feature being demonstrated is often most clearly conveyed through the visual demonstration — the spoken narration describes what the feature does, but the visual shows how it looks and where the learner will find it. For organisations with ADA Title II or Section 508 obligations, product training video should be included in the AD programme at this tier.
New employee onboarding video with organisational structure and facility content. Onboarding videos frequently include org chart introductions (the organisation chart is displayed on screen but not read aloud), facility walkthroughs (the physical space is shown without narration of its layout), culture videos that convey values and history visually, and system access instructions that show interface screenshots. Tier 2 because the visual information density is moderate and the accommodation urgency is lower than safety training — but the onboarding video is a new employee's first exposure to the accessible formats the employer provides, and a failure at this stage sends a signal about the organisation's accessibility culture.
Lecture capture with whiteboard, slides, and visual reference content. University lecture recordings present the broadest range of AD requirements in the L&D content landscape. An instructor who writes on a whiteboard, gestures to visual materials, refers to slides by pointing rather than reading them, or draws diagrams in real time generates visual information that the lecture audio track may not adequately convey. The lecture capture captioning guide covers the base captioning requirements; SC 1.2.5 AD is the next compliance layer for the same content.
Tier 3 — AD optional unless an accommodation request is received
Talking-head interview and monologue video. A single presenter discussing content facing the camera, with no on-screen text, diagrams, demonstrations, or visual instructional elements — and where the narration conveys all of the instructional information — may satisfy the SC 1.2.5 exception: "if all of the information in the video track is already provided in the audio track, no audio description is necessary." This is the narrowest exception in the WCAG video criteria. Before relying on it, verify that the video truly has no on-screen text (including lower-thirds name labels, which must be described), no visual transitions that introduce new content, and no visual information that adds meaning beyond what the speaker says.
Discussion and panel format video. Multi-speaker roundtables and discussion formats are often Tier 3 for the same reason as talking-head monologue — the audio track is the content. However, if the discussion includes visual references (pointing to a shared screen, showing a document, referencing an exhibit) those visual elements require AD and push the video to Tier 2.
Audio-primary content (podcast-format video, screen-off lecture recordings). Some organisations publish content that is audio-primary — a recording where the visual track contains only a static title slide or a presenter's face with no instructional visual content. These are Tier 3. But verify the full video before classifying — a "podcast format" recording that includes slides or screen sharing during any segment requires AD for those segments.
The caption compliance programme build guide covers the content inventory methodology for building the full library list; the same inventory process produces the information needed for AD tier classification: content type, visual information density, and compliance risk category. The AD tier can be added as a column to the same spreadsheet used for caption prioritisation.
The AD production workflow — five steps
Audio description production has five distinct stages: content audit, gap analysis, script writing, voice recording, and audio delivery. For most L&D teams, the first three stages are the most unfamiliar — caption production starts with the audio and proceeds largely through automated transcription and human review, while AD production starts with a visual content audit and proceeds through a scripting and recording process more similar to voice-over production than to transcription work.
Step 1 — Content audit: identify all visual-only information
The AD script writer reviews the complete video and identifies every moment where visual information is present that is not conveyed by the existing audio track. For each identified segment, the reviewer records:
- Timecode in: when the visual information appears
- Timecode out: when the next spoken word begins (the end of the available gap)
- Gap duration: available silence in seconds
- Visual content to describe: the description that will be needed
- Estimated reading time: how long the description will take at conversational pace
Common visual-only information categories in L&D training video:
- Slide content not read aloud: slide titles shown for 2 seconds before the narrator begins speaking, bullet points displayed while the narrator summarises rather than reads them verbatim, sub-titles and section headers, callouts and annotations, footnotes and source citations, section dividers with text
- On-screen interface elements: button labels, menu items clicked during software training, form field names, navigation breadcrumbs, interface state changes, error messages, status indicators
- Physical demonstrations: hand position, technique sequence, physical orientation of objects, tactile characteristics relevant to the demonstrated procedure (which face of the connector goes which direction, how tightly to grip, what the correct orientation looks like)
- Charts and data visualisations: axis labels and data ranges, trend lines and their direction, bar heights and comparative values, pie chart percentages, table row and column contents, heat map colour coding
- Organisational and structural diagrams: org chart hierarchy, workflow sequence in a process diagram, system architecture, relationship map
- Text overlays and lower thirds: presenter name and title, location labels, date and time overlays, chapter markers
Step 2 — Gap analysis: standard AD vs extended AD
For each piece of visual-only information identified in the content audit, the gap analysis determines whether the required description can be inserted into an available natural pause in the audio track. A natural pause is a silence of 1.5 seconds or more between spoken words. The reading pace for AD narration is approximately 130–160 words per minute at a natural speaking pace — which translates to roughly 3–4 words per second. A 3-second pause allows 9–12 words of description; a 5-second pause allows 16–20 words.
Most professionally produced training video — narrated slide presentations, instructor-led content with a professional narrator, corporate training videos — leaves adequate natural pauses for standard AD. The narrator pauses between sentences, between slides, between topics. These pauses are typically 2–5 seconds and accommodate most common AD descriptions.
Content types that frequently fail the gap analysis for standard AD:
- Densely narrated eLearning modules produced in Articulate Storyline, Rise, or Captivate with continuous narration over animated slides. Professional eLearning narration scripts are often written to maximise audio content density — the narrator moves directly from one sentence to the next with minimal pause. When there are insufficient natural pauses for standard AD, the options are: extended description (pausing the video), a separate AD track version of the video with additional pause footage inserted, or a re-recorded narration track with inserted pauses (feasible only if the source project files are available and narration re-recording is within budget).
- Highly visual procedure demonstration videos where the physical demonstration is continuous and the visual information density is extreme — an assembly sequence with 40 steps in 8 minutes where the narrator speaks throughout. The AD script for a video like this may require more words than available pauses can accommodate, requiring prioritisation of the most critical visual information rather than complete description.
When extended AD is required, the decision tree is: (1) is the LMS video player extended-AD-capable? (2) if not, can the video be re-rendered with freeze frames at extended-AD insertion points? (3) if neither, is a separate AD-enabled video version feasible? The eLearning authoring tools guide covers the eLearning-specific platform context for caption delivery; the same platform constraints apply to AD delivery from SCORM packages.
Step 3 — AD script writing
The AD script is a time-coded document structured similarly to a caption file but for description narration rather than transcribed speech. Each script entry includes the timecode insertion point, the description text, and the gap duration available for it. AD script writing conventions from the DCMP Audio Description Standards:
Present tense throughout. Audio description is written in the present tense regardless of when the action occurred in the video's production: "The instructor holds a syringe" not "The instructor held a syringe." Present tense matches the learner's experience of watching the video in real time.
Describe what is seen, not what it means. The AD narrates visual facts, not interpretations. "A bar chart shows three bars of roughly equal height, labelled Tier 1, Tier 2, Tier 3" not "A bar chart shows that all three pricing tiers perform about the same." The interpreter is the learner; the AD is the data.
Lead with the most important element. Because the available gap may close before the full description is read, the most important visual information comes first. For a complex diagram: label the diagram type and overall structure, then the most critical data points, then secondary elements. A description that front-loads the most important information provides maximum value even if it is cut short by the resuming narration.
Match the vocabulary and register of the original narration. If the original narration uses domain-specific terminology — a medical training video that says "administer the subcutaneous injection at a 45-degree angle" — the AD script uses the same domain vocabulary for the described visual content: "The instructor inserts the needle at a 45-degree angle into the patient's upper arm, pinching the skin fold with the left hand." The AD voice should sound like it belongs to the same training programme as the original narration. The same glossary that governs caption accuracy applies to AD terminology.
Describe race, gender, disability, and identity only when relevant to the content. If the training content is about diverse team representation and includes intentional visual diversity as instructional content, those elements are described. If a presenter's demographic characteristics are not instructionally relevant, the AD does not describe them.
Step 4 — Voice talent recording
The AD narration is recorded by a voice talent whose voice is distinct from the original narrator — typically a different pitch, register, or timbre — so that learners can clearly distinguish the AD narration from the original audio content. Recording quality standards are the same as for professional narration: quiet recording environment, professional microphone, consistent acoustic quality across the session, no background noise or room reverb.
Voice talent sources for AD narration:
- The captioning vendor's AD service: Most captioning vendors that offer AD include voice talent as part of the AD production package. 3Play Media, Verbit, and DCMP-certified providers include voice talent in their integrated AD pricing. This is typically the most efficient approach for L&D teams without an existing narration voice talent roster.
- The L&D team's existing voice talent agency: If the team uses a voice talent agency for original course narration, that agency can often provide a second talent for AD narration. The advantage is that the recording quality will match the original narration technically.
- ACB Audio Description Project voice talent directory: The ACB AD Project maintains a roster of trained AD voice talent. This is useful for L&D teams building an in-house AD programme rather than relying on a captioning vendor.
- In-house narration team: Some L&D teams with internal voice talent (narrators, facilitators, or subject matter experts who record their own narration) can record AD internally. The key requirement is that the AD voice be distinguishable from the original narrator — either a different person or a clearly different vocal register.
Recording the AD narration against a time-coded script requires that the talent either records each segment discretely (individual takes for each AD insertion point) or records continuously against a visual-and-audio reference of the video. Most professional AD voice talent sessions use the discrete-segment approach with the video queued to each insertion timecode, recording each description segment individually and ensuring it fits within the available gap before moving to the next segment.
Step 5 — Audio mixing and delivery
The recorded AD segments are edited against the original audio timeline, aligning each description segment with its insertion timecode and verifying that: (a) the description starts no earlier than the preceding speech ends, (b) the description ends before the following speech begins, (c) the description audio level is appropriate relative to the original narration (typically 1–2 dB above the original narration level to ensure intelligibility when both tracks are present), and (d) no description segment overlaps the original audio.
The final mixed AD track is a complete alternate audio track for the video file — not a modification to the original audio, but an additional track that can be selected independently. The original audio remains the default track; learners who do not need AD hear the original narration. Learners who select the AD track hear the original narration plus the AD insertions in the gaps.
Delivery format depends on the LMS platform (covered in detail in the LMS delivery section below). Most modern video formats (MP4 with multi-track audio, MKV, WebM with multi-track audio) support alternate audio tracks natively. The LMS video hosting platform determines whether the alternate track is exposed to the learner in the player interface.
AD providers for L&D content
The captioning vendor landscape for L&D content — covered in the captioning RFP guide and the vendor accuracy evaluation guide — does not map directly onto the AD vendor landscape. Not all captioning vendors offer audio description, and the vendors who specialise in AD for entertainment and broadcast content are not always the most appropriate for L&D training content. The key criteria for an AD vendor for L&D:
- DCMP-certified or DCMP-aligned standards: The DCMP Audio Description Standards are the benchmark for educational and training content AD. Verify that the vendor's AD production process aligns with DCMP standards for pace, description focus, terminology, and reading speed.
- Domain expertise for technical content: The same vocabulary problem that affects caption accuracy in technical training content affects AD description accuracy. An AD scripter who does not understand "subcutaneous injection at 45 degrees" or "3CX switch module" will describe the visual incorrectly. Vendor AD quality in the L&D verticals (healthcare, technology, compliance training) requires the same domain expertise vetting as caption quality.
- LMS-compatible delivery formats: The vendor must be able to deliver the AD in a format the LMS platform can use — alternate audio track in a specific container format, or a separate AD-enabled video version. Confirm delivery format requirements with the LMS platform before contracting with an AD vendor.
- Integrated caption + AD workflow: Producing captions and AD together in a single vendor job is the most efficient approach. Verify that the vendor can deliver both in the same production cycle from the same source video file.
Primary AD vendors for L&D content
3Play Media. 3Play offers integrated caption and audio description services with LMS delivery support. Their AD workflow produces DCMP-aligned descriptions with a human AD scripter and voice talent. They support delivery as alternate audio track or as a separate AD-enabled video file. 3Play is the most commonly used combined caption + AD vendor in the US higher education and corporate L&D market.
Verbit. Verbit's enterprise captioning service includes audio description as an add-on for L&D clients. Delivery via LMS integration (Kaltura, Panopto, Brightcove integrations). Verbit's pricing model is subscription-based with per-minute overages, similar to their caption pricing structure.
Described and Captioned Media Program (DCMP). The DCMP itself produces audio description for educational media distributed through its free service to K-12 and postsecondary education programmes that register with the DCMP. For L&D programmes at public universities, the DCMP is a free resource for content that meets their programme requirements. DCMP-produced AD is the quality benchmark for educational AD — their standards are the production reference for all educational AD vendors.
ACB Audio Description Project. The American Council of the Blind's AD Project is a resources hub rather than a production vendor — it maintains quality standards, a voice talent directory, a training programme for AD script writers, and advocacy for AD standards. L&D teams building in-house AD capability should engage with the ACB AD Project for quality standards and training resource.
Deluxe Media. Deluxe's accessibility services division provides caption and AD production for broadcast and enterprise content, including L&D applications. Their AD production quality is entertainment-grade — highest production value but also higher pricing than education-focused vendors.
AI-assisted AD tools (caution advisory). Vision-language model tools that generate AD scripts from video frames are an emerging category (as of 2026). The current capabilities: scene description (identifying that a diagram or procedure is present), object recognition (identifying specific elements in a frame), and TTS for AD narration delivery. Current limitations for compliance L&D content: domain-specific visual accuracy is insufficient for technical procedure, equipment, and clinical content; the timing and gap analysis required for standard AD insertion requires human verification; AI-generated descriptions for complex charts and data visualisations frequently misread labels and values; DCMP standard compliance requires human review at the same level as writing from scratch. AI AD tools are not a production shortcut for compliance content as of 2026 — they require human review that negates most of the cost savings relative to human AD production.
LMS platform delivery — platform by platform
The LMS delivery context for audio description is more complex than for captions because multi-track audio is less universally supported in LMS video players than multi-track text (captions). Caption delivery as a sidecar file is a well-established LMS feature; alternate audio track delivery is not. The delivery strategy for AD must be determined before the production format is committed.
Kaltura
Kaltura is the most AD-compatible LMS video hosting platform available to L&D teams. Kaltura MediaSpace, KAF (Kaltura Application Framework) LMS integrations, and the Kaltura Player all support multiple audio tracks natively. Adding the AD track to a Kaltura-hosted video:
- Log in to the Kaltura Management Console (KMC).
- Navigate to the video entry in the Media list.
- In the entry edit view, select the Flavors tab and then the Audio Tracks section.
- Upload the AD audio file (typically a WAV or AAC file of the full AD track, time-synced to the video) as an additional audio track.
- Label the track "Audio Description" in the track settings — this label appears in the player's audio selector.
The Kaltura Player displays an audio track selector in the player bar when multiple audio tracks are present. The selector is keyboard-accessible and is compatible with NVDA, JAWS, and VoiceOver. The Kaltura API supports uploading and managing alternate audio tracks programmatically — for L&D teams with large libraries, the Kaltura API can be used to upload AD tracks in batch. The WCAG 2.1 AA captions guide covers the Kaltura caption delivery workflow; the same Kaltura API framework applies to audio track management for AD.
Panopto
Panopto is the most common lecture capture platform at universities and supports multiple audio streams in its player. Adding an AD track in Panopto:
- Open the Panopto video in the Panopto Editor.
- In the Streams panel on the left, click "Add a Stream" or "Import Audio."
- Upload the AD audio file as an additional audio stream.
- In the stream settings, label it "Audio Description" and set it to be toggleable (not mixed into the primary audio).
The Panopto player displays an audio stream selector that learners can use to enable the AD track. For university L&D programmes using Panopto for lecture capture, this is the most direct path to SC 1.2.5 compliance for recorded lectures. The university lecture capture guide covers the Panopto caption workflow alongside which recordings have the highest AD priority.
Brightcove
Brightcove Video Cloud supports audio tracks via the Brightcove Media API. Adding an AD audio track to a Brightcove video:
- Use the Audio Track API to create a new audio track on the video object:
POST /v1/accounts/{account_id}/videos/{video_id}/audio_trackswith the track metadata (kind: "audio-description", label: "Audio Description", language: "en"). - Upload the AD audio source file via the Dynamic Ingest API, associating it with the audio track object.
- The Brightcove Player renders an audio track selector in the player bar when multiple tracks are present.
Brightcove's player audio track selector is accessible via keyboard and compatible with major screen readers. Brightcove is most commonly used in corporate L&D for externally hosted training videos, marketing training, and customer education platforms. The WCAG pre-recorded video captions guide covers Brightcove in the caption context; the same Brightcove API framework applies to AD track management.
Cornerstone OnDemand
Cornerstone's native video player has limited multi-track audio support as of 2026. Cornerstone primarily hosts video content as MP4 files with a single audio track — it does not expose an audio track selector in the player interface for end users. The recommended approach for AD delivery in Cornerstone:
- Separate AD video version: Upload a second version of the video with the AD audio mixed into the primary audio track (not as a separate track, but baked in). Label the course or video asset clearly as "Audio Description version" and make it available from the same course landing page or via a dedicated AD course track for enrolled learners who need it.
- Kaltura-Cornerstone integration: For organisations using Kaltura as a video CDN alongside Cornerstone as the LMS, the Kaltura player with multi-track AD support can be embedded within Cornerstone via the Cornerstone-Kaltura integration. This provides full alternate-audio-track AD support within the Cornerstone LMS frame using Kaltura's player capabilities.
- SCORM package with AD-enabled video: For SCORM content delivered through Cornerstone, a separate AD-enabled SCORM package (with the AD audio baked into the video at the SCORM packaging stage) can be made available alongside the standard SCORM package for learners who request the AD version.
TalentLMS
TalentLMS does not natively support multiple audio tracks in its built-in video player. TalentLMS video hosting renders a single MP4 with one audio track. AD delivery options:
- YouTube or Vimeo embedding: Both YouTube and Vimeo support multiple audio tracks for AD delivery. The video can be hosted on YouTube or Vimeo with the AD track configured, then embedded in TalentLMS via the iFrame embed option. The YouTube or Vimeo player with AD track selection renders within the TalentLMS course frame. YouTube audio description tracks are uploaded via YouTube Studio's multi-track audio feature (available to channels with the multi-track audio enabled). Vimeo Pro/Business supports audio tracks via the Vimeo API.
- Separate AD course or section: A separate TalentLMS course or lesson with the AD-audio video can be created and enrolled as an alternative for learners who request AD. This is operationally more complex than a native alternate-track solution but is fully accessible.
Workday Learning
Workday Learning's built-in media player has limited support for alternate audio tracks as of 2026. Workday's video player is designed for single-track MP4 delivery. For AD delivery in Workday Learning:
- External video hosting: Host the video in Kaltura or Panopto with the AD track configured, and embed the external player in the Workday Learning lesson via the multimedia learning resource option. The Kaltura or Panopto player with AD support renders within the Workday Learning frame.
- Separate AD lesson: Create a parallel Workday Learning lesson with the AD-enabled video version for learners who have accommodation records indicating need for AD. This requires a manual accommodation workflow to direct affected learners to the AD lesson.
Articulate Storyline, Rise, and Adobe Captivate
eLearning authoring tools present the most complex AD delivery context because the video is embedded within an interactive SCORM/xAPI package rather than hosted independently. The authoring tool's video player controls are determined by the published package, not the LMS player.
Articulate Storyline 360: Storyline can include multiple audio tracks on a timeline layer using the audio recording and layer functionality, but implementing an AD-toggle via the native Storyline player is not straightforward. The standard approach is: produce a separate AD-enabled published version of the Storyline project (a separate SCORM package with the AD audio inserted as a layer that plays in parallel with the course audio), upload both the standard and AD SCORM packages to the LMS, and make the AD version accessible to learners who need it via a course description link or accommodation routing. The eLearning authoring tools caption and accessibility guide covers the Storyline caption workflow in detail; the same separation-of-packages approach applies for AD.
Articulate Rise 360: Rise is a web-based authoring tool that embeds video via URL reference (YouTube, Vimeo, or direct MP4 hosting). For videos in Rise courses, use YouTube or Vimeo with AD tracks configured, and reference those video URLs in the Rise course. The video player within Rise is the YouTube or Vimeo player, which supports audio track selection for AD.
Adobe Captivate: Captivate's HTML5 output can include JavaScript-controlled audio layers. A Captivate developer can implement an AD audio toggle using Captivate's interactive audio layers — this is a custom development task, not a standard Captivate feature — but it is feasible for organisations with Captivate development resources. Alternatively, the same separate-SCORM-package approach used for Storyline applies to Captivate published SCORM content.
For the full authoring tool context including caption delivery and WCAG compliance for eLearning content, see the eLearning authoring tools captioning guide.
Integrating AD into your caption programme
The most efficient path to SC 1.2.5 compliance is to integrate audio description production into an existing caption programme rather than building it as a separate initiative. The caption programme has already established the vendor relationship, the content inventory, the workflow, and the LMS delivery mechanism. AD production extends the same workflow with two additional steps — AD script and recording — using the same source video files that the caption project already processed.
Starting with new content: build AD into the production workflow
For new training video, the most cost-effective AD approach is to commission AD alongside captions in the same vendor job. The vendor already has the source video file; producing the AD script and recording from the same file in the same cycle adds marginal cost while eliminating the logistics overhead of a separate AD job later. Many captioning vendors offer a combined caption + AD package at a lower total price than ordering them separately.
For content produced in-house (self-recorded screencasts, in-person training recordings, lecture captures), build AD production into the post-production checklist alongside captioning: submit the video for captions and AD together. The same content inventory column that tracks caption status ("captioned," "captioned and verified," "pending captioning") gets an AD status column ("AD produced," "AD pending," "AD not required — Tier 3").
The most cost-efficient new-content approach is to design natural pauses into narration scripts at the storyboard stage. If a slide shows a three-column comparison table that the narrator summarises rather than reads, script a 3-second pause after "as you can see in this comparison" to give the AD narrator time to describe the table. Pauses designed into the original script eliminate the extended AD production complexity and reduce AD per-minute cost because the script writer's work is simpler when gaps are available.
Retrofitting AD to existing captioned content
The caption programme backlog remediation methodology — described in the large-scale backlog remediation guide — applies to the AD backlog with a content-type prioritisation overlay. For a video library that already has captions but no AD:
- Run the tier classification on the existing caption-complete inventory. Identify all Tier 1 (safety/procedure, software training, visual-primary content) and Tier 2 (product training, onboarding, lecture capture) videos.
- Submit Tier 1 content for AD production in priority order within the Tier 1 category: safety procedures first, then clinical/procedure content, then software interface training.
- Budget the AD retrofit as a separate project from the ongoing caption programme — it has a different cost per minute, different vendor requirements, and different LMS delivery requirements. The caption programme budget guide framework can be extended with an AD-specific budget model using the Tier 1 and Tier 2 content volume from the classification exercise.
- Report progress on SC 1.2.5 separately from SC 1.2.2 compliance in the accessibility programme's KPI reporting. The audit finding for SC 1.2.5 is distinct from SC 1.2.2; resolving it requires a separate remediation track.
Governing AD as part of the accessibility programme
The accessibility coordinator playbook covers the governance model for a mature L&D accessibility programme. AD introduces governance requirements beyond what the caption programme requires:
- AD quality review: The same QA methodology applied to captions — DCMP spot-check protocol, error classification, vendor SLA terms — should be applied to AD production. The AD equivalent of the caption accuracy threshold is DCMP Audio Description Standards compliance: descriptions are present for all significant visual information, descriptions are accurate, timing is correct, no overlap with original audio.
- LMS delivery verification: Unlike caption sidecar files (which can be verified by opening the LMS video player and checking the CC button), AD delivery requires testing with a screen reader and the audio track selector — a more complex verification step. The accessibility QA process should include AD delivery verification for each LMS platform where AD-enabled content is deployed.
- Platform support monitoring: LMS platform audio track support is an evolving area. Platforms that do not currently support multi-track audio (Cornerstone, TalentLMS) may add support in future updates. The AD delivery strategy should be reviewed annually against current platform capabilities.
Eight AD failure modes
Failure mode 1: AD script written faster than available gap allows
The most common production failure in AD: the description script is longer than the available pause. At a natural AD narration pace of 130–160 words per minute, a 2-second pause accommodates 4–5 words. A 2-second gap in a compliance training video might need: "The slide shows a flowchart with five steps: Identify, Report, Investigate, Remediate, Close." That description is 14 words — approximately 5–6 seconds of reading time — and does not fit in a 2-second gap. The solution at script writing time is to prioritise the most critical elements: "Flowchart: five steps, Identify through Close." Eight words. Still tight at 3 seconds but feasible. The failure happens when the script writer does not check reading time against gap duration and the voice talent rushes to fit the description into the available space, reducing comprehension.
Failure mode 2: AD audio mixed into the primary track instead of as a separate track
If the AD narration is mixed directly into the primary audio track — rather than delivered as an alternate, selectable track — it is heard by all learners, not only by those who need it. Some sighted learners experience the AD narration as disruptive or confusing. An AD track that cannot be turned off defeats the purpose of the alternate-track delivery model. Always verify that the delivered file structure includes a separate alternate audio track, not a modified primary track.
Failure mode 3: Visual content description is inaccurate
An AD error in describing on-screen text — misreading a number, inverting a comparison, omitting a label — creates the same harm as a caption error for a hearing-impaired learner: the blind learner receives incorrect information from the accessible format. AD quality review must include verification that all on-screen text described in the AD script matches the actual on-screen text in the video, all described data values are accurate, all described steps match the demonstrated sequence, and all described interface elements have their correct labels. This requires both text-accuracy checking (does the AD correctly transcribe on-screen text?) and procedural-accuracy checking (does the AD correctly describe the sequence and positions demonstrated?).
Failure mode 4: LMS player does not expose alternate audio track to screen readers
A technically correct alternate-audio-track AD file fails to meet SC 1.2.5 if the LMS player does not expose the audio track selector to assistive technology. Some LMS video players render the audio track selector as a mouse-only UI element — the selector is visible on screen but not accessible via keyboard navigation or exposed to screen reader focus. WCAG SC 4.1.2 (Name, Role, Value) requires that all user interface components, including audio track selectors, expose their state and value to accessibility APIs. Verify AD delivery on the target LMS platform using NVDA and JAWS in a Windows environment, and VoiceOver on macOS/iOS, to confirm that the audio track selector is operable without a mouse before certifying the AD delivery as SC 1.2.5-compliant.
Failure mode 5: No AD description of text-only slides
A common AD production shortcut for slide-based training content: the AD production process is set up to identify visual demonstrations and diagrams but skips purely text-based slides, on the assumption that text slides will be read aloud by the narrator. Many narrators read slide titles and main points but summarise rather than read bullet points verbatim — leaving the specific bullet text, supporting data, and annotations on the slide without description. An AD review should verify every slide transition to confirm that all on-screen text not read aloud by the narrator is described in the AD track, including sub-bullets, footnotes, source citations, and call-out boxes.
Failure mode 6: Separate AD-video approach creates accessibility gap in course navigation
For LMS platforms that require a separate AD-enabled video version (rather than an alternate audio track), the most common failure mode is that the AD version is technically available but practically inaccessible: it is buried in a course description link, accessible only by contacting an accessibility coordinator, or available only after submitting an accommodation request that takes three business days to process. The access path to the AD version must be immediately accessible from the course landing page — ideally, the AD version link or toggle is visible from the same point in the course where the standard video is presented, not hidden behind an accommodation request process.
Failure mode 7: AD terminology does not match course domain vocabulary
An AD script writer who is unfamiliar with the subject matter of the training video will substitute generic visual descriptions for domain-specific visual descriptions: "a person uses a tool on a piece of equipment" instead of "the technician applies the torque wrench to the oil drain plug at the 5 o'clock position." The first description is technically accurate but pedagogically empty — the blind learner cannot use it to understand what is being demonstrated. The second description provides the same instructional information as the visual. The same domain vocabulary requirements that apply to caption accuracy apply to AD script accuracy: the AD scripter must understand the terminology of the content well enough to describe domain-specific visuals accurately.
Failure mode 8: Extended AD implemented as a re-rendered video without consulting the LMS delivery workflow
When a video requires extended AD (additional pause footage inserted to allow longer descriptions), the re-rendered video is a different file from the original — longer, with freeze frames at AD insertion points. This re-rendered file must replace the original video in the LMS, or be provided as the AD version alongside the original. If the extended-AD video is simply delivered without updating the LMS hosting, learners continue to access the original video without AD. Conversely, if the extended-AD video replaces the original for all learners, sighted learners experience the freeze-frame pauses as a playback glitch. Extended AD video management requires a clear LMS hosting plan: either a separate AD version linked from the course, or a player that implements the pause/resume mechanism and renders the extended pauses smoothly for all learners.
Seven-question FAQ
Does WCAG 2.1 Level AA require audio description for all training video or only some?
WCAG SC 1.2.5 applies to all pre-recorded synchronized media — video with audio. There is no exception for short videos, for videos without visual instructional content, or for videos at a specific format (MP4, YouTube-hosted, SCORM-embedded). The exception that WCAG itself recognises is narrow: "if all of the information in the video track is already provided in the audio track, no audio description is necessary." This exception applies to video where the audio narration explicitly conveys all visual information with no residual visual-only content. Most training video does not meet this test. The operational guidance: if the video contains any on-screen text not read aloud, any diagram or comparison table, any physical demonstration not fully narrated, or any interface navigation where the cursor position matters — the exception does not apply and SC 1.2.5 requires AD.
We have comprehensive transcripts for all our training videos. Does that satisfy SC 1.2.5?
No. Transcripts satisfy SC 1.2.3 (Level A — Audio Description or Media Alternative). SC 1.2.5 (Level AA — Audio Description Prerecorded) requires a dedicated audio description track; the "media alternative" option from SC 1.2.3 is not available at SC 1.2.5. If your compliance obligation is WCAG 2.1 Level AA — which it is under ADA, Section 508, and EAA — transcripts alone leave SC 1.2.5 unmet. The transcript does, however, satisfy SC 1.2.3, which is positive: an SC 1.2.5 audit finding of "missing audio description, transcript available" is less severe than "missing audio description, no text alternative" because the transcript demonstrates partial accessibility effort. But it is still an open SC 1.2.5 finding that requires an AD track to resolve.
The one context where a comprehensive transcript is sufficient: WCAG 2.0 Level A conformance (not Level AA). Section 508 pre-2018 refresh referenced WCAG 2.0 Level A, which does not include SC 1.2.5 (a Level AA criterion). If an organisation's Section 508 compliance programme pre-dates the 2018 refresh and has not been updated to the revised standard, the legacy standard does not include SC 1.2.5 — but the revised standard does, and most Section 508 compliance programmes have updated to the revised standard.
How much does audio description cost, and how do I budget for it?
Standard per-minute rates for audio description in the L&D market (US, 2026):
- Talking-head / low visual density (Tier 3 content): $8–$12 per finished video minute
- Narrated slide presentation / moderate visual density (Tier 2 content): $12–$16 per finished video minute
- Procedure demonstration / high visual density / software interface walkthrough (Tier 1 content): $16–$25 per finished video minute
These rates are 3–5× the equivalent caption rates for the same content. For comparison: captions for the same Tier 1 content run $2–$5 per finished video minute. The higher AD rate reflects the AD script writing step (which has no equivalent in caption production — caption transcription is automated; AD scripting is fully manual) and the voice recording and audio mixing steps.
Combined caption + AD jobs from the same vendor typically come at a 10–15% discount to the sum of separate jobs. For organisations just beginning a caption programme, ordering captions and AD together for new content is significantly more cost-effective than retrofitting AD to captioned content later.
The caption programme budget planning guide provides the full budgeting framework for multi-year L&D accessibility programme planning. The AD cost model is an extension of the same framework: apply the per-minute AD rates to the Tier 1 and Tier 2 content volume from the tier classification exercise, add the LMS delivery implementation cost for platforms requiring custom integration, and present the three-year total cost of ownership alongside the compliance risk cost of not implementing AD.
Can we use AI-generated audio description for our training content?
As of mid-2026, AI-generated audio description is not production-grade for compliance training content. The current state of the technology: vision-language models can generate scene descriptions from video frames with adequate accuracy for general consumer content, but fail systematically on exactly the content types that L&D Tier 1 priority requires most urgently — technical procedures (incorrect identification of technique, position, and tool application), clinical demonstrations (mislabelling anatomical references and procedure steps), software interface navigation (missing or incorrect interface element identification), and data visualisations with precision values (chart axis misreads, table value transpositions). AI-generated AD also requires the same gap analysis and timing work as human AD, and the editing overhead for AI-generated AD that fails on domain-specific content is often greater than writing from scratch.
The honest comparison: AI caption generation has reached production-grade accuracy for general speech in supported languages and with glossary biasing — which is the technology underlying GlossCap's caption service. AI audio description has not yet reached an equivalent production-grade level for compliance content. This will change; vision-language models are improving rapidly. But as of 2026, the recommendation for compliance L&D content is: human AD production for Tier 1 and Tier 2 content, with AI-assisted AD tools used only for initial draft generation with full human review for Tier 3 (low-visual-density) content where the review overhead is manageable.
Do we need audio description for live webinar and ILT recordings?
WCAG SC 1.2.5 covers pre-recorded synchronized media — it does not apply to live streaming or live broadcasts. Live webinars, instructor-led virtual classroom sessions, and live all-hands broadcasts are not subject to SC 1.2.5 during the live session. WCAG does provide accessibility criteria for live content under other criteria (SC 1.2.4 for live captions, SC 1.2.9 for audio-only live content), but there is no SC 1.2.5 equivalent for live video.
The critical operational point: live recordings that are subsequently published as on-demand content become pre-recorded synchronized media when they are posted to the LMS or knowledge base. At the point of publication, SC 1.2.5 applies. The ILT and virtual classroom captioning playbook covers the transition from live session to recorded content in the captioning context; the same transition point triggers the SC 1.2.5 AD obligation for the archived recording. Building AD production into the post-session production workflow for recordings that will be published — alongside caption production — is the most efficient approach.
Which WCAG criterion is more commonly audited — SC 1.2.2 (captions) or SC 1.2.5 (audio description)?
SC 1.2.2 (captions) is far more commonly cited in WCAG audit findings and enforcement actions than SC 1.2.5 (audio description) for L&D content. This reflects both the volume of content without captions relative to content without AD (most organisations have no caption programme; almost no organisations have an AD programme for training content) and the higher prevalence of D/HH accommodation requests relative to visual disability accommodation requests in workforce statistics. The National Center on Disability and Education estimates that approximately 15% of the population has some degree of hearing loss, while approximately 2–3% has a visual disability affecting screen access. The audit finding frequency mirrors this: caption findings appear in most WCAG video accessibility audits; SC 1.2.5 findings appear in a smaller but growing proportion.
However, "less commonly audited" does not mean "lower risk." An SC 1.2.5 finding is equally a Level AA compliance failure as an SC 1.2.2 finding. The DOJ's enforcement posture on ADA Title II WCAG 2.1 Level AA compliance does not distinguish between which Level AA criteria are "more important" — Level AA is the standard, and any Level AA failure is a compliance finding. As ADA Title II enforcement expands under the April 2026 regulations and as Section 508 audits mature, SC 1.2.5 findings are becoming more common in compliance audits of L&D content libraries.
Our audit found SC 1.2.5 failures. What is the fastest path to remediation?
If an accessibility audit has returned SC 1.2.5 findings for a training library, the remediation prioritisation is:
- Immediately: Acknowledge the finding formally in the organisation's accessibility statement or VPAT. Many auditors and regulators recognise documented acknowledgment plus a remediation timeline as demonstrating good-faith compliance effort. An undocumented failure is more damaging in enforcement context than a documented failure with a remediation plan.
- 30 days: Tier-classify the flagged content using the Tier 1/2/3 framework above. Identify the Tier 1 content (safety procedures, clinical demonstrations, software interface training) as the highest-priority remediation targets. Submit Tier 1 content for AD production immediately, using an emergency turnaround from a captioning vendor with AD services.
- 60–90 days: Work through Tier 2 content (product training, onboarding, lecture capture) in volume batches. Establish the LMS delivery pathway for each platform type in the organisation's stack.
- Ongoing: Build AD production into the content production workflow for all new content so that the SC 1.2.5 backlog does not grow while the remediation project is in progress. New content published without AD adds to the audit finding count; blocking new content publication until AD is confirmed prevents the backlog from growing during remediation.
The caption compliance programme build guide and the large-scale backlog remediation playbook both cover the compliance programme build methodology and backlog prioritisation process that apply equally to the AD remediation context. The governance documentation required for an AD programme — policy update, vendor agreement, LMS delivery SOP, quality checklist — is the same structure as for the caption programme, with AD-specific criteria.
Start with captions that know your glossary — then add audio description
GlossCap produces WCAG-compliant captions for L&D training video with your company glossary applied at the ASR decoding stage — product names, SDK symbols, and domain terms come out correctly before the first human review pass. When you are ready to add SC 1.2.5 audio description to your caption programme, the same content inventory and vendor workflow applies. Build the caption foundation first; the AD layer follows the same prioritisation framework.