Tag Archives: transcription

Testing A.I. Transcript Accuracy (most recent test)

Periodically we do tests of various AI services to see if we should be using something on the backend of Transcriptive-A.I. We’re more interested in having the most accurate A.I. than we are with sticking with a particular service (or trying to develop our own). The different services have different costs, which is why Transcriptive Premium costs a bit more. Gives us more flexibility in deciding which service to use.

This latest test will give you a good sense of how the different services compare, particularly in relation to Adobe’s transcription AI that’s built into Premiere.

The Tests

Short Analysis (i.e. TL;DR):

 For well recorded audio, all the A.I. services are excellent. There isn’t a lot of difference between the best and worst A.I… maybe one or two words per hundred words. There is a BIG drop off as audio quality gets worse and you can really see this with Adobe’s service and the regular Transcriptive-A.I. service.

A 2% difference in accuracy is not a big deal. As you start getting up around 6-7% and higher, the additional time it takes to fix errors in the transcript starts to become really significant. Every additonal 1% in accuracy means 3.5 minutes less of clean up time (for a 30 minute clip). So small improvements in accuracy can make a big difference if you (or your Assistant Editor) needs to clean up a long transcript.

So when you see an 8% difference between Adobe and Transcriptive Premium, realize it’s going to take you about 25-30 minutes longer to clean up a 30 minute Adobe transcript.

Takeaway: For high quality audio, you can use any of the services… Adobe’s free service or the .04/min TS-AI service. For audio of medium to poor quality, you’ll save yourself a lot of time by using Transcriptive-Premium. (Getting Adobe transcripts into Transcriptive requires a couple hoops to jump through, Adobe didn’t make it as easy as they could’ve, but it’s not hard. Here’s how to import Adobe transcripts into Transcriptive)

(For more info on how we test, see this blog post on testing AI accuracy)

Long Analysis

When we do these tests, we look at two graphs: 

  1. How each A.I. performed for specific clips
  2. The accuracy curve for each A.I. which shows how it did from its Best result to Worst result.

The important thing to realize when looking at the Accuracy Curves (#2 above) is that the corresponding points on each curve are usually different clips. The best clip for one A.I. may not have been the best clip for a different A.I. I find this Overall Accuracy Curve (OAC) to be more informative than the ‘clip-by-clip’ graph. A given A.I. may do particularly well or poorly on a single clip, but the OAC smooths the variation out and you get a better representation of overall performance.

Take a look at the charts for this test (the audio files used are available at the bottom of this post):

Click to zoom in on the image
Overall accuracy curve for AI Services

All of the A.I. services will fall off a cliff, accuracy-wise, as the audio quality degrades. Any result lower than about 90% accuracy is probably going to be better done by a human. Certainly anything below 80%. At 80% it will very likely take more time to clean up the transcript than to just do it manually from scratch.

The two things I look for in the curve is where does it break below 95% and where does it break below 90%. And, of course, how that compares to the other curves. The longer the curve stays above those percentages, the more audio degradation a given A.I. can deal with. 

You’re probably thinking, well, that’s just six clips! True, but if you choose six clips with a good range of quality, from great to poor, then the curve will be roughly the same even if you had more clips. Here’s the full test with about 30 clips:

Accuracy of Adobe vs. Transcriptive, full test results

While the curves look a little different (the regular TS A.I. looks better in this graph), mostly it follows the pattern of the six clip OAC. And the ‘cliffs’ become more apparent… Where a given level of audio causes AI performance to drop to a lower tier. Most of the AIs will stay at a certain accuracy for a while, then drop down, hold there for a bit, drop down again, etc. until the audio degrades so much that the AI basically fails.

Here are the actual test results:

TS A.I.AdobeSpeechmaticsTS Premium
Interview97.2%97.2%97.8%100.0%
Art97.6%97.2%99.5%97.6%
NYU91.1%88.6%95.1%97.6%
LSD92.3%96.9%98.0%97.4%
Jung89.1%93.9%96.1%96.1%
Zoom85.5%80.7%89.8%92.8%
Remember: Every additonal 1% in accuracy means 3.5 minutes less of clean up time (for a 30 minute clip).

So that’s the basics of testing different A.I.s! Here are the clips we used for the smaller test to give you an idea of what’s meant by ‘High Quality’ or ‘Poor Quality’. The more jargon, background noise, accents, soft speaking, etc there is in a clip, the harder it’ll be for the A.I. to produce good results. And you can hear that below. You’ll notice that all the clips are 1 to 1.5 minutes long. We’ve found that as long as the clip is representative of the whole clip it’s taken from, you don’t get any additional info from the whole clip. An hour long clip will product similar results to one minute, as long as that one minute has the same speakers, jargon, background noise, etc.

Any questions or feedback, please leave a note in the Comments section! (or email us at cs@nulldigitalanarchy.com)

‘Art’ test clip
‘Interview’ test clip
‘Jung’ test clip
‘NYU’ test clip
‘LSD’ test clip
‘Zoom’ test clip

Adobe Transcripts and Captions & Transcriptive: Differences and How to Use Them Together

Adobe just released a big new Premiere update that includes their Speech-to-Text service. We’ve had a lot of questions about whether this kills Transcriptive or not (it doesn’t… check out the new Transcriptive Rough Cutter!). So I thought I’d take a moment to talk about some of the differences, similarities, and how to use them together.

The Adobe system is basically what we did for Transcriptive 1.0 in 2017. So Transcriptive Rough Cutter has really evolved into an editing and collaboration tool, not just something you use to get transcripts.

The Adobe solution is really geared towards captions. That’s the problem they were trying to solve and you can see this in the fact you can only transcribe sequences. And only one at a time. So if you want captions for your final edit, it’s awesome. If you want to transcribe all your footage so you can search it, pull out selects, etc… it doesn’t do that.

So, in some ways the Transcriptive suite (Transcriptive Rough Cutter, PowerSearch, TS Web App) is more integrated than Adobe’s own service. Allowing you to transcribe clips and sequences, and then search, share, or assemble rough cuts with those transcripts. There are a lot of ways using text in the editing process can make life a lot easier for an editor, beyond just creating captions.

Sequences Only

Adobe's Text panel for transcribing sequences

The Adobe transcription service only works for Sequences. It’s really designed for use with the new Caption system they introduced earlier this year.

Transcriptive can transcribe media and sequences, giving the user a lot more flexibility. One example: they can transcribe media first, use that to find soundbites or information in the clips and build a sequence off that. As they edit the sequence, add media, or make changes they can regenerate the transcript without any additional cost. The transcripts are attached to the media… so Transcriptive just looks for which portions of the clips are in the sequence and grabs the transcript for that portion.

Automatic Rough Cut

Rough Cut: There are two ways of assembling a ‘rough cut’ with Transcriptive Rough Cutter. What we’re calling Selects, which is basically what I mention above in the ‘Sequences Only’ paragraph: Search for a soundbite, you set In/Out points in the transcript of the clip with that soundbite, and insert that portion of the video into a sequence.

Then there’s the Rough Cut feature, where Transcriptive RC will take a transcript that you edit and assemble a sequence automatically: creating edits where you’ve deleted or struckthrough text and removing the video that corresponds to those text edits. This is not something Adobe can do or has made any indication they will do, so far anyways.

Editing with text in Premiere Pro and Transcriptive Rough Cutter

Collaboration with The Transcriptive Web App

One key difference is the ability to send transcripts to someone that does not have Premiere. They can edit those transcripts in a web browser and add comments, and then send it all back to you. They can even delete portions of the text and you can use the Rough Cut feature to assemble a sequence based on that.

Searching Your Premiere Project

PowerSearch: This separate panel (but included with TS) lets you search every piece of media in your Premiere project that has a transcript in metadata or in clip/sequence markers. Premiere is pretty lacking in the Search department and PowerSearch gives you a search engine for Premiere. It only works for media/sequences transcribed by Transcriptive. Adobe, in their infinite wisdom, made their transcript format proprietary and we can’t read it. So unless you export it out of Premiere and then import it into Transcriptive, PowerSearch can’t read the text unfortunately.

Easier to Export Captions

Transcriptive RC let’s you output SRT, VTT, SCC, MCC, SMPTE, or STL just by clicking Export. You can then use these in any other program. With Adobe you can only export SRT, and even that takes multiple steps. (you can get other file formats when you export the rendered movie, but you have to render the timeline to have it generate those.)

I assume Adobe is trying to make it difficult to use the free Adobe transcripts anywhere other than Premiere, but I think it’s a bit shortsighted. You can’t even get the caption file if you render out audio… you have to render a movie. Of course, the workaround is just to turn off all the video tracks and render out black frames. So it’s not that hard to get the captions files, you just have to jump through some hoops.

Sharing Adobe Transcripts with Transcriptive Rough Cutter and Vice Versa

I’ve already written a blog post specifically about showing how to use Adobe Transcripts with Transcriptive. But, in short… You can use Adobe transcripts in Transcriptive by exporting the transcript as plain text and using Transcriptive’s Alignnment feature to sync the text up to the clip or sequence. Every word will have timecode just as if you’d transcribed it in Transcriptive. (this is a free feature)

AND… If you get your transcript in Transcriptive Rough Cutter, it’s easy to import it into the Adobe Caption system… just Export a caption file format Premiere supports out of Transcriptive RC and import it into Premiere. As mentioned, you can Export SRT, VTT, MCC, SCC, SMPTE, and STL.

Two A.I. Services

Transcriptive Rough Cutter gives you two A.I. services to choose from, allowing you use whatever works best for your audio. It is also usually more accurate than Adobe’s service, especially on poor quality audio. That said, the Adobe A.I. is good as well, but on a long transcript, even a percentage point or two of accuracy will add up to saving a significant amount of time cleaning up the transcript.

Transcriptive: Here’s how to transcribe using your Speechmatics credits for now.

If you’ve been using Speechmatics credits to transcribe in Transcriptive, our transcription plugin for Premiere Pro, then you noticed that accessing your credits in Transcriptive 2.0.2 and later is not an option anymore. Speechmatics is discontinuing the API that we used to support their service in Transcriptive, which means your Speechmatics credentials can no longer be validated inside of the Transcriptive panel.

We know a lot of users still have Speechmatics credits and have been working closely with Speechmatics so those credits can be available in your Transcriptive account as soon as possible. Hopefully in the next week or two.

In the meantime, there are a couple ways users can still transcribe with Speechmatics credits. 1) Use an older version of Transcriptive like v1.5.2 or v2.0.1. Those should still work for a bit longer but uses the older, less accurate API or 2) Upload directly on their website and export the transcript as a JSON file to be imported into Transcriptive.  It is a fairly simple process and a great temporary solution for this. Here’s a step-by-step guide:

1. Head to the Speechmatics website – To use your Speechmatics credits, head to www.speechmatics.com and login to your account. Under “What do you want to do?”, choose “Transcription” and select the language of your file. 

Speechmatics_Uploading

2. Upload your media file to the Speechmatics website – Speechmatics will give you the option to drag and drop or select your media from a folder on your computer. Choose whatever option works best for you and then click on “Upload”. After the file is uploaded, the transcription will start automatically and you can check the status of the transcription on your “Jobs” list.  
Speechmatics_Transcribing3. Download a .JSON file –  After the transcription is finished (refresh the page if the status doesn’t change automatically!), click on the Actions icon to access the transcript. You will then have the option to export the transcript as a .JSON file

Speechmatics_JSON

4. Import the .JSON file into any version of Transcriptive – Open your Transcriptive panel in Premiere. If you are usingTranscriptive 2.0,  be sure Clip Mode is turned on. Select the clip you have just transcribed on Speechmatics and click on “Import”.  If you are using an older version of Transcriptive, drop the clip into a sequence before choosing “Import”. 

Transcriptive_Import

You will then have the option to “Choose an Importer”. Select the JSON option and import the Speechmatics file saved on your computer. The transcript will be synced with the clip automatically at no additional charge.

Transcriptive_Json

One important thing to know is that, although Transcriptive v1.x still have Speechmatics as an option and it still works, we would still recommend following the steps above to transcribe with Speechmatics credits. The option available in these versions of the panel is an older version of their API and less accurate than the new version. So we recommend you transcribe on the Speechmatics website if you want to use your Speechmatics credits now and not wait for them to be transferred.

However, we should have the transfer sorted out very soon, so keep an eye open for an email about it if you have Speechmatics credits. If the email address you use for Speechmatics is different than the one you use for Transcriptive.com, please email cs@nulldigitalanarchy.com. We want to make sure we get things synced up so the credits go to the right place!

Artificial Intelligence vs. Video Editors

With Transcriptive, our new tool for doing automated transcriptions, we’ve dove into the world of A.I. headfirst. So I’m pretty familiar with where the state of industry is right now. We’ve been neck deep in it for the last year.

A.I. is definitely changing how editors get transcripts and search video for content. Transcriptive demonstrates that pretty clearly with text.  Searching via object recognition is something that also is already happening. But what about actual video editing?

One of the problems A.I. has is finishing. Going the last 10% if you will. For example, speech-to-text engines, at best, have an accuracy rate of about 95% or so. This is about on par with the average human transcriptionist. For general purpose recordings, human transcriptionists SHOULD be worried.

But for video editing, there are some differences, which are good news. First, and most importantly, errors tend to be cumulative. So if a computer is going to edit a video, at the very least, it needs to do the transcription and it needs to recognize the imagery. (we’ll ignore other considerations like style, emotion, story for the moment) Speech recognition is at best 95%, object recognition is worse. The more layers of AI you have, usually those errors will multiply (in some cases there might be improvement though) . While it’s possible automation will be able to produce a decent rough cut, these errors make it difficult to see automation replacing most of the types of videos that pro editors are typically employed for.

Secondly, if the videos are being done for humans, frequently the humans don’t know what they want. Or at least they’re not going to be able to communicate it in such a way that a computer will understand and be able to make changes. If you’ve used Alexa or Echo, you can see how well A.I. understands humans. Lots of situations, especially literal ones (find me the best restaurant), it works fine, lots of other situations, not so much.

Many times as an editor, the direction you get from clients is subtle or you have to read between the lines and figure out what they want. It’s going to be difficult to get A.I.s to take the way humans usually describe what they want, figure out what they actually want and make those changes.

Third… then you get into the whole issue of emotion and storytelling, which I don’t think A.I. will do well anytime soon. The Economist recently had an amusing article where it let an A.I. write the article. The result is here. Very good at mimicking the style of the Economist but when it comes to putting together a coherent narrative… ouch.

It’s Not All Good News

There are already phone apps that do basic automatic editing. These are more for consumers that want something quick and dirty. For most of the type of stuff professional editors get paid for, it’s unlikely what I’ve seen from the apps will replace humans any time soon. Although, I can see how the tech could be used to create rough cuts and the like.

Also, for some types of videos, wedding or music videos perhaps, you can make a pretty solid case that A.I. will be able to put something together soon that looks reasonably professional.

You need training material for neural networks to learn how to edit videos. Thanks to YouTube, Vimeo and the like, there is an abundance of training material. Do a search for ‘wedding video’ on YouTube. You get 52,000,000 results. 2.3 million people get married in the US every year. Most of the videos from those weddings are online. I don’t think finding a few hundred thousand of those that were done by a professional will be difficult. It’s probably trivial actually.

Same with music videos. There IS enough training material for the A.I.s to learn how to do generic editing for many types of videos.

For people that want to pay $49.95 to get their wedding video edited, that option will be there. Probably within a couple years. Have your guests shoot video, upload it and you’re off and running. You’ll get what you pay for, but for some people it’ll be acceptable. Remember, A.I. is very good at mimicking. So the end result will be a very cookie cutter wedding video. However, since many wedding videos are pretty cookie cutter anyways… at the low end of the market, an A.I. edited video may be all ‘Bridezilla on A Budget’ needs. And besides, who watches these things anyways?

Let The A.I Do The Grunt Work, Not The Editing

The losers in the short term may be assistant editors. Many of the tasks A.I. is good for… transcribing, searching for footage, etc.. is now typically given to assistants. However, it may simply change the types of tasks assistant editors are given. There’s a LOT of metadata that needs to be entered and wrangled.

While A.I. is already showing up in many aspects of video production, it feels like having it actually do the editing is quite a ways off.  I can see creating A.I. tools that help with editing: Rough cut creation, recommending color corrections or B roll selection, suggesting changes to timing, etc. But there’ll still need to be a person doing the edit.

 

How Doc Filmmakers Are using A.I. to Create Captions and Search Footage in Premiere Pro

Artificial Intelligence (A.I.) and machine learning are changing how video editors deal with some common problems. 1) how do you get accurate transcriptions for captions or subtitles? And 2) how do you find something in hours of footage if you don’t know exactly where it is?

Getting out of the Transcription Dungeon

Kelley Slagle, director, producer and editor for Cavegirl Productions, has been working on Eye of the Beholder, a documentary on the artists that created the illustrations for the Dungeons and Dragon game. With over 40 hours of interview footage to comb through searching through it all has been made much easier by Transcriptive, a new A.I. plugin for Adobe Premiere Pro.


eye-beholder 

Why Transcribe?

Imagine having Google for your video project. Turning all the dialog into text makes everything easily searchable (and it supports 28 languages). Not too mention making it easy to create captions and subtitles.

The Dragon of Time And Money

Using a traditional transcription service for 40 hours of footage, you’re looking at a minimum of $2400 and a few days to turn it all around. Not exactly cost or time effective. Especially if you’re on a doc budget. However, it’s a problem for all of us.

Transcriptive helps solve the transcription problem, and the problems of searching video and captions/subtitles. It uses A.I. and machine learning to automatically generate transcripts with up to 95% accuracy and bring them into Premiere Pro. And the cost? About $4/hour (or much less depending on the options you choose) So, 40 hours is $160 vs $2400. And you’ll get all of it back in a few hours.

Yeah, it’s hard to believe.

Read what these three filmmakers have to say and try the Transcriptive demo out on your own footage. It’ll make it much easier to believe.

 

“We are using Transcriptive to transcribe all of our interviews for EYE OF THE BEHOLDER. The idea of paying a premium for that much manual transcription was daunting. I am in the editing phase now and we are collaborating with a co-producer in New York. We need to share our ideas for edits and content with him, so he is reviewing transcripts generated by Transcriptive and sending us his feedback and vice versa. The ability to get a mostly accurate transcription is fine for us, as we did not expect the engine to know proper names of characters and places in Dungeons & Dragons.” – Kelley Slagle, Cavegirl Productions

Google Your Video Clips and Premiere Project?

 

Since everything lives right within Premiere, all the dialog is fully searchable. It’s basically a word processor designed for transcripts, where every word has time code. Yep, every word of dialog has time code. Click on the word and jump to that point on the timeline. This means you don’t have to scrub through footage to find something. Search and jump right to it. It’s an amazing way for an editor to find any quote or quip.

As Kelley says, “We are able to find what we need by searching the text or searching the metadata thanks to the feature of saving the markers in our timelines. As an editor, I am now able to find an exact quote that one of my co-producers refers to, or find something by subject matter, and this speeds up the editing process greatly.”

Joy E. Reed of Oh My! Productions, who’s directing the documentary, ‘Ren and Luca’ adds, “We use sequence markers to mark up our interviews, so when we’re searching for specific words/phrases, we can find them and access them nearly instantly. Our workflow is much smoother once we’ve incorporated the Transcriptive markers into our project. We now keep the Markers window open and can hop to our desired areas without having to flip back and forth between our transcript in a text document and Premiere.”

Workflow, Captions, and Subtitles

ren-luca-L

Captions and subtitles are one of the key uses of Transcriptive. You can use it with the Premiere’s captioning tool or export many different file formats (SRT, SMPTE, SCC, MCC, VTT, etc) for use in any captioning application.

“We’re using Transcriptive to transcribe both sit down and on-the-fly interviews with our subjects. We also use it to get transcripts of finished projects to create closed captions/subtitles.”, says Joy. “We can’t even begin to say how useful it has been on Ren and Luca and how much time it saves us. The turnaround time to receive the transcripts is SO much faster than when we sent it out to a service. We’ve had the best luck with Speechmatics. The transcripts are only as accurate as our speakers – we have a teenage boy who tends to mumble, and his stuff has needed more tweaking than some of our other subjects, but it has been great for very clearly recorded material. The time it saves vs the time you need to tweak for errors is significant.”

captions

Transcriptive is fully integrated into Premiere Pro, you never have to leave the application or pass metadata and files around. This makes creating captions much easier, allowing you to easily edit each line while playing back the footage. There are also tools and keyboard shortcuts to make the editing much faster than a normal text editor. You then export everything to Premiere’s caption tool and use that to put on the finishing touches and deliver them with your media.

Another company doing documentary work is Windy Films. They are focused on telling stories of social impact and innovation, and like most doc makers are usually on tight budgets and deadlines. Transcriptive has been critical in helping them tell real stories with real people (with lots of real dialog that needs transcribing).

They recently completed a project for Planned Parenthood. The deadline was incredibly tight. Harvey Burrell, filmmaker at Windy, says, “We were trying to beat the senate vote on the healthcare repeal bill. We were editing while driving back from Iowa to Boston. The fact that we could get transcripts back in a matter of hours instead of a matter of days allowed us to get it done on time. We use Transcriptive for everything. The integration into premiere has been incredible. We’ve been getting transcripts done for a long time. The workflow was always a clunky; particularly to have transcripts in a word document off to one side. Having the ability to click on a word and just have Transcriptive take you there in the timeline is one of our favorite features.”

Getting Accurate Transcripts using A.I.

 

Audio quality matters. So the better the recording and the more the talent enunciates correctly, the better the transcript. You can get excellent results, around 95% accuracy, with very well recorded audio. That means your talent is well mic’d, there’s not a lot of background noise and they speak clearly. Even if you don’t have that, you’ll still usually get very good results as long as the talent is mic’d. Even accents are ok as long as they speak clearly. Talent that’s off mic or if there’s crosstalk will cause it to be less accurate.

6-Full-Screen

Transcriptive lets you sign up with the speech services directly, allowing you to get the best pricing. Most transcription products hide the service they’re using (they’re all using one of the big A.I. services), marking up the cost per minute to as much as .50/min. When you sign up directly, you get Speechmatics for $0.07/min. And Watson gives you the first 1000 minutes free. (Speechmatics is much more accurate but Watson can be useful).

Transcriptive itself costs $299 when you check out of the Digital Anarchy store. A web version is coming soon as well. To try transcribing with Transcriptive you can download the trial version here. (remember, Speechmatics is the more accurate service and the only service available in the demo) Reach out to sales@nulldigitalanarchy.com if you have questions or want an extended trial.

Transcriptive is a plugin that many didn’t know they were waiting for. It is changing the workflow of many editors in the industry. See for yourself how we’re transforming the art of transcription.