Captioning & accessible video on the Web

Last updated: 7 Mar. 2011


In his seminal “Declaration of Independence for Cyberspace,” Electronic Frontier Foundation cofounder John Perry Barlow called the Internet “a world that all may enter without privilege or prejudice accorded by race, economic power, military force, or station of birth.” While some have dismissed Barlow’s declaration as overly optimistic, it’s clear that the Internet is making a profound impact on traditional social hierarchies, as is demonstrated by the pivotal role that online social networks played in the recent protests and (largely peaceful) revolution in Egypt, among other examples. Yet even as the Internet has granted people of all classes and nationalities an unprecedented level of access to information, there remains a very real risk that some minority groups will still be left behind.

Media analysts speak of a “digital divide” between people living in poorer rural regions and their fellow citizens who reside in more affluent, denser-populated areas; however, there is another sort of a digital divide which scholars often overlook: the increasingly apparent gap between the disabled and the non-disabled. According to a study published last month by the Pew Research Center, the Internet continues to prove challenging for many disabled people—two percent of Americans report that their Internet use has been hindered or halted entirely by a disability. Moreover, the Pew report documents a clear negative correlation between disability and likelihood of having Internet access.

It is apparent that the Internet is not living up to its full potential apropos the disabled community. We shall discuss one part of the problem, namely, the dismal state of online video captioning and its negative effects on the disabled, particularly the deaf and hard of hearing.

Definitions & problem sketch

It seems worthwhile to review the fundamental terminology of captioning and related accessibility aides. By “captioning,” we mean any real-time, on-screen text that accompanies audiovisual media and which attempts to convey through the written word any content that would otherwise be unavailable to a hearing-impaired audience. Captions may be either “open,” i.e., permanently visible to all viewers of a video, or “closed,” i.e., able to be displayed or hidden according to the end user’s preference. The accessibility community (and, to a lesser degree, the content industry) further distinguishes captions from “subtitles” which are designed to convey foreign language content to all viewers, regardless of their hearing ability. Subtitles are generally open, as their target audience is often very broad, while captions—traditionally expected to serve a much smaller audience—are almost invariably closed. In the discussion that follows, we shall focus solely on captions; however, the reader should understand that many of the technological issues we review are equally applicable to both captions and subtitles.

Captioning allows the deaf and hard of hearing to participate—both passively, as traditional media consumers, and actively, as contributors to a rich new convergent media culture—by allowing them to access, view, and respond to the full range of educational, entertainment, and journalistic content available to those with unimpaired hearing ability. Clearly this provides great benefits to society; as legal scholar Cass Sunstein argues, “for citizens of a heterogeneous democracy, a fragmented communications market creates considerable dangers,” and making online video accessible to the deaf serves to reduce this fragmentation.

One could reasonably conclude that accessible, captioned video would be rampant on the World Wide Web due to the parallel desires of for-profit companies to maximize profits, not-for-profit organizations to maximize outreach, and advocacy groups to maximize social equality. The preponderance of evidence, however, suggests otherwise. As San Jose Mercury News columnist Troy Wolverton reports in his February 2011 article “Those with disabilities are underserved by technology”:

In the offline world of television and DVD movies, much video comes with captions that make it accessible to those with hearing loss. In Web video, though, very little is captioned.

Apple’s iTunes, for instance, sells television shows, but it doesn’t offer any with captions. It offers movies with captions, but they represent a fraction of the total movies it sells. In some cases, movies that do have captions in their DVD versions don’t have them on iTunes.

Wolverton goes on to explain that Apple is far from the only guilty party in this instance. He notes that other popular online video services, including YouTube, Hulu, and Netflix, have been similarly derelict in making their content easily accessible to the deaf and hard of hearing population. Furthermore, blame should not be placed entirely on Web 2.0 services such as these. Consider the case of the 2010 Winter Olympics hosted in Vancouver, British Columbia. Journalist and accessibility researcher Joe Clark—known for his role as what Atlantic Monthly author Michael Erard called the “King of Closed Captions,” “a self-appointed watchdog of a growing industry”—performed a detailed analysis of the two primary Vancouver Olympics Web sites, and he noticed a frightening array of accessibility problems. Clark found that among the many prerecorded and live video streams these sites offered, not a single video featured any sort of captioning. In particular, online feeds from Canadian broadcaster CTV were provided devoid of captions, even though these same feeds had already been captioned for use on network television.

We will now endeavor to explain how and why online video has reached this state. First we shall briefly explore some of the technology used to store, transmit, and caption video files on the Web, and then we will explore some social and legal issues pertaining to the situation.

Technological challenges

Internet scholars have recognized the need for accessible, captioned video almost as long as online video has enjoyed any degree of widespread support. As early as 1996, researchers at the National Center for Accessible Media, a non-profit organization sponsored by Boston-based public broadcasting station WGBH, sought to develop solutions for rendering online video accessible to the hearing-impaired. In September of that year, Internet and new media journal First Monday published “Captioning video clips on the World Wide Web,” an article by NCAM member Geoff Freed detailing the prototype of a technique for adding captions to Internet-hosted QuickTime video files. Using QuickTime’s ability to embed a text track alongside the usual audio and video tracks that make up a QuickTime file, Freed was able to produce open-captioned clips which could be converted to any one of several formats for cross-platform display.

Freed’s technique was rather primitive, and the inability to use traditional broadcast “timecodes” to synchronize captions to audio frames made it a highly laborious process. New caption creators could take over one hour to produce a one minute clip, though Freed claimed that “with practice this effort [could] be reduced in half to 30 minutes or less.” Still, he was confident that it would “probably not take Web captioning very long to become integrated into the production process” of online video content. Some content industry insiders concurred. Jessica Sandin wrote in Broadcasting & Cable article “Captioning on the Web” that “for millions of Americans with sensory disabilities, the new media’s strong focus on sight and sound could mean they would be left behind on the information superhighway.” Sandin went on to praise NCAM’s work, noting that “the obvious solution” to the problem multimedia content presents to deaf and hard of hearing audiences is “something ‘like captions on TV’ for the expanding audio part of the Web.”

Yet for all the hopes of Freed, Sandin, and those like them, captioning on the Web is nearly as rare today as it was in 1996. The failure of captions to attain critical mass on the Web, it seems, stems in part from the lack of a common standard for captioning digital video. As Mark Pilgrim, author of several software development books, including Dive Into Accessibility and Dive Into HTML5, explains:

Even in broadcast television, captioning technology was fractured by different broadcast technologies in different countries. Digital video had the capability of unifying the technologies and learning from their mistakes. Of course, exactly the opposite happened. Early caption formats split along company lines; each major video software platform (RealPlayer, QuickTime, Windows Media, Adobe Flash) implemented captioning in their own way, with levels of adoption ranging from nil to zilch.

Accessibility consultant Mike Paciello concurs in his book Web Accessibility for People with Disabilities, noting that “The Web followed a very typical development process based on standard engineering processes that, all too often, do not include considerations for people with disabilities” (21).

Legislative issues

Some accessibility codes for the Web already require captioning—for example, to comply with Section 508 of the US Rehabilitation Act, a Web site must provide time-synchronized text equivalents, i.e., captions, for its video content—but in practice these codes are little more than voluntary guidelines for the vast majority of Web sites. As Cynthia D. Waddell explains in “U.S. Web Accessibility in Depth,” a chapter in the compilation book Web Accessibility: Web Standards and Regulatory Compliance by Jim Thatcher, et al., Section 508 compliance is only legally mandated for Web sites operated by the Federal government or by certain state governments that have adopted it as a part of their accessibility policy (538–9).

The inability of Internet engineers to settle on a single satisfactory standard for captioning and the ineffectiveness of current Web captioning–related laws have lead some accessibility advocates to call for more potent government intervention. Joe Clark, for example, writes in “This is How the Web Gets Regulated,” a 2008 article in Web development trade journal A List Apart, that “online captioning still pretty much does not exist. That’s probably going to change, and the way it’s going to change is by government regulation.” Clark feels that Web developers should embrace this destiny, taking an active role in the development of online captioning legislation to ensure that it is based on open standards.

Clark’s thoughts echo the views of media scholars regarding captioning in traditional broadcast television. As communications professor Jennifer L. Gregg notes in her “Contextual analysis of the passage of closed-captioning policy,” broadcast captioning initially faced a “market stalemate”: The very small audience for captioned programming left little economic incentive for content producers to caption their shows, and the very small number of captioned shows left little room for the audience for captioned programming to expand (547–8). Only when the federal government introduced captioning requirements and incentives in the Television Decoder Circuitry Act and the Telecommunications Act of 1996 did TV captioning become widespread (542–3).

Progress & conclusion

Some two years after Clark’s plea for online captioning legislation was published, the first such law came into effect. In October 2010, President Obama signed the 21st Century Communications & Video Accessibility Act into law. The act will require that video content that was previously broadcast on TV with captions retain these captions when posted online. This alone represents a great improvement on the status quo, but many disability advocates feel that this law does not go far enough. As Suzanne Robitaille explains in Bloomberg Businessweek article “‘The Annoying Orange’ Needs More Captions,” audiovisual content is increasingly being published through online-only “Webisodes” which, having never aired on broadcast TV, are totally exempt from the new captioning requirements.

It remains to be seen how effective the Video Accessibility Act and future legislation will be. While initial reports appear promising, caution is certainly warranted. Media scholars have long acknowledged that, as professor Kathryn C. Montgomery notes in Generation Digital, government attempts to regulate communications for the public good often lead to compromises which ultimately accomplish little and satisfy no one. Moreover, while Gregg and many others have performed in-depth research into the social workings of captioning policy in traditional media, very little research has been published concerning the application of similar legislation to online video. It seems that greater attention from media analysts may be needed to understand and resolve the dilemma that video on the Web currently presents to the deaf and hard of hearing population.

