Improving Accuracy of A.I. Transcripts with Custom Vocabulary

The Glossary feature in Transcriptive is one way of increasing the accuracy of the transcripts generated by artificial intelligence services. The A.I. services can struggle with names of people or companies and it’s a big of mixed bag with technical terms or industry jargon. If you have a video with names/words you think the A.I. will have a tough time with, you can enter them into the Glossary field to help the A.I. along.

For example, I grabbed this video of MLB’s top 30 draft picks in 2018:

Obviously a lot names that need to be accurate and since we know what they are, we can enter them into the Glossary.

Transcriptive's Glossary to add custom vocabulary

As the A.I. creates the transcript, words that sound similar to the names will usually be replaced with the Glossary terms. As always, the A.I. analyzes the sentence structure and makes a call on whether the word it initially came up with fits better in the sentence. So if the Glossary term is ‘Bohm’ and the sentence is ‘I was using a boom microphone’, it probably won’t replace the word. However if the sentence is ‘The pick is Alex boom’, it will replace it. As the word ‘boom’ makes no sense in that sentence.

Here are the resulting transcripts as text files: Using the Glossary and Normal without Glossary

Here’s a short sample to give you an idea of the difference. Again, all we did was add in the last names to the Glossary (Mize, Bart, Bohm):

With the Glossary:

The Detroit Tigers select Casey Mize, a right handed pitcher. From Auburn University in Auburn, Alabama. With the second selection of the 2018 MLB draft, the San Francisco Giants select Joey Bart a catcher. A catcher from Georgia Tech in Atlanta, Georgia, with the third selection of a 2018 MLB draft. The Philadelphia Phillies select Alec Bohm, third baseman

Without the Glossary:

The Detroit Tigers select Casey Mys, a right handed pitcher. From Auburn University in Auburn, Alabama. With the second selection of the 2018 MLB draft, the San Francisco Giants select Joey Bahrke, a catcher. A catcher from Georgia Tech in Atlanta, Georgia, with the third selection of a 2018 MLB draft. The Philadelphia Phillies select Alec Bomb. A third baseman

As you can see it corrected the names it should have. If you have names or words that are repeated often in your video, the Glossary can really save you a lot of time fixing the transcript after you get it back. It can really improve the accuracy, so I recommend testing it out for yourself!

It’s also worth trying both Speechmatics and Transcriptive-A.I. Both are improved by the glossary, however Speechmatics seems to be a bit better with glossary words. Since Transcriptive-A.I. has a bit better accuracy normally, you’ll have to run a test or two to see which will work best for your video footage.

If you have any questions, feel free to hit us up at cs@nulldigitalanarchy.com!

Leave a Reply

Your email address will not be published. Required fields are marked *