For example, if a viewer searches for “fan” it would produce a list of results that include both singular, “fan”, and plural, “fans”, versions.īelow is a live example of this search feature, using a case study video of USA Network’s Mr. Consequently, viewers can now quickly skim through long assets, being assured that the information they seek is contained inside and jump to the most relevant moments. Included with each is a timestamp but, more conveniently, the ability to click a result in the player and jump to that part in the video. This new search feature allows features to type in terms and locate specific instances where the word was mentioned. For example, an hour video coming up in search results can be daunting for the viewer if they don’t know when the relevant information is introduced.Īs a result, IBM Watson Media has expanded its capabilities around searching captions by introducing it inside the player. However, finding a video can sometimes only be part of the equation. For internal content, this includes as part of the portal experience to turn up relevant videos through an enterprise video search that looks across the transcript. Once the captions are generated, users can start to search them. Contact IBM sales to learn more about this optional service. This aspect and overall accuracy can be improved by training Watson for specific content as well. For example, Watson might transcribe someone as saying they “found a really good sale”, but context later talks about boats and so the phrase could be corrected to state they “found a really good sail”.
Watson will use the entire video to generate captions as well, improving accuracy. It will determine the most appropriate result for words and phrases, but can misinterpret both names and industry specific terms. In addition, Watson can only transcribe words that it’s familiar with. Low quality audio can also negatively impact this, although this should just affect content that was overly compressed.
Multiple speakers, overt accents, slurred pronunciation and amble background noise, be it a soundtrack or just general noise, can all impact the accuracy of the captions generated. Optimal conditions involve content with one speaker who is talking at a normal pace without background noise. The accuracy of the captions depends on a variety of variables. Supported languages include: Arabic, Chinese, English (UK or US), French, Japanese, Portuguese and Spanish. Initiating the process requires either selecting a language for the video or channel, as noted in this Watson Generated Captions guide. Generating captions with machine learning is a quick process, taking roughly the length of the asset to produce.
This article discusses, briefly, the process of generating these captions before detailing the user experience in searching them and how content owners can edit them as well. Afterwards, these corrected captions are not only accessible in the player but can be searched against as well, empowering viewers to jump to parts in the video where their topic of interest is mentioned. Once generated, content owners can modify these for accuracy through a dashboard editor, simplified by confidence levels that will underline words under a certain threshold. This is achieved through automatic closed captioning, with support to both edit and search them.Ĭalled Watson Captioning, available as part of IBM Video Streaming and as a stand alone service, the process works by utilizing IBM Watson to generate closed captions, converting video speech to text. The frustration is a growing challenge, but one that can be addressed while simultaneously adding valuable captions to video content. 79% of executives with video content archives agree that a “frustration of using on-demand video is not being able to quickly find the piece of information I am looking for when I need it.” This data comes from a joint IBM and Wainhouse Research report, which interviewed 1,801 executives.