Speechmatics, one of the A.I. engines we support, recently released a new speech model which promised much higher accuracy. Transcriptive Rough Cutter now supports that if you choose the Speechmatics option. Also with Premiere able to generate transcripts with Adobe Sensei, we get a lot of questions about how it compares to Transcriptive Rough Cutter.
So we figured it was a good time to do a test of the various A.I. speech engines! (Actually we do this pretty regularly, but only occasionally post the results when we feel there’s something newsworthy about them)
You can read about the A.I. testing methodology in this post if you’re interested or want to run your own tests. But, in short, Word Error Rate is what we pay most attention to. It’s simply:
NumberOfWordsMissed / NumberOfWordsInTranscript
where NumberOfWordsMissed = the number of words in the corrected transcript that the A.I. failed to recognize. If instead of the word ‘Everything’ the A.I. produced ‘Even ifrits sing’, it still missed just one word. In the reverse situation, it would count as three missed words.
We also track punctuation errors, but those can be somewhat subjective, so we put less weight on that.
What’s the big deal between 88% and 93% Accuracy?
Every 1% of additional accuracy means roughly 15% less incorrect words. A 30 minute video has, give or take, about 3000 words. So with Speechmatics you’d expect to have, on average, 210 missed words (7% error rate) and with Adobe Sensei you’d have 360 missed words (12% error rate). Every 10 words adds about 1:15 to the clean up time. So it’ll take about 18 minutes more to clean up that 30 minute transcript if you’re using Adobe Sensei.
Every additonal 1% in accuracy means 3.5 minutes less of clean up time (for a 30 minute clip). So small improvements in accuracy can make a big difference if you (or your Assistant Editor) needs to clean up a long transcript.
Of course, the above are averages. If you have a really bad recording with lots of words that are difficult to make out, it’ll take longer to clean up than a clip with great audio and you’re just fixing words that are clear to you but the A.I. got wrong. But the above numbers do give you some sense of what the accuracy value means back in the real world.
The Test Results!
All the A.I.s are great at handling well-recorded audio. If the talent is professionally mic’d and they speak well, you should get 95% or better accuracy. It’s when the audio quality drops off that Transcriptive and Speechmatics really shine (and why we include them in Transcriptive Rough Cutter). And I 100% encourage you to run your own tests with your own audio. Again, this post outlines exactly how we test and you can easily do it yourself.
Speechmatics New is the clear winner, with a couple first place finishes, no last place finishes, and at 93.3% rate overall (you can find the spreadsheet with results and the audio files further down the post). One caveat… Speechmatics takes about 5x as long to process. So a 30 minute video will take about 3 minutes with Transcriptive A.I. and 15-20 minutes with Speechmatics. If you select Speechmatics in Transcriptive, you’re getting the new A.I. model.
Adobe Sensei is the least accurate with two last place finishes and no first places, for a 88.3% accuracy overall. Google, which is another A.I. service we evaluate but currently don’t use, is all over the place. Overall, it’s 80.6%, but if you remove the worst and best examples, it’s a more pedestrian 90.3%. No idea why it failed so badly on the Bill clip, but it’s a trainwreck. The Bible clip is from a public domain reading of the bible, which I’m guessing was part of Google’s training corpus. You rarely see that kind of accuracy unless the A.I. was trained on it. Anyways, this inconsistency is why we don’t use it in Transcriptive.
Here are the clips we used for this test:
Here’s the spreadsheet of the results (SM = Speechmatics, Green means best performance, Orange means worst). Again, mostly we’re focused on the Word Accuracy. Punctuation is a secondary consideration:
Transcriptive-A.I. doesn’t use a single A.I. services on the backend. We don’t have our own A.I., so like most companies that offer transcription we use one of the big companies (Google, Watson, Speechmatics, etc).
We initially started off with Speechmatics as the ‘high quality’ option. And they’re still very good (as you’ll see shortly), but not always. However, since we had so many users that liked them, we still give you the option to use them if you want.
However, we’ve now added Transcriptive-A.I. This uses whatever A.I. services we think is best. It might use Speechmatics, but it might also use one of a dozen other services we test.
Since we encourage users to test Transcriptive-A.I. against any service out there, I’ll give you some insight on how we test the different services and choose which to use behind the scenes.
Usually we take 5-10 audio clips of varying quality that are about one minute long. Some very well recorded, some really poorly recorded and some in between. The goal is to see which A.I. works best overall and which might work better is certain circumstances.
When grading the results, I save out a plain text file with no timecode, speakers or whatever. I’m only concerned about word accuracy and, to a lesser degree, punctuation accuracy. Word accuracy is the most important thing. (IMO) For this purpose, Word 2010 has an awesome Compare function to see the difference between the Master transcript (human corrected) and the A.I. transcript. Newer versions of Word might be better for comparing legal documents, but Word 2010 is the best for comparing A.I. accuracy.
Also, let’s talk about the rules for grading the results. You can define what an ‘error’ is however you want. But you have to be consistent about how you apply the definition. Applying them consistently matters more than the rules themselves. So here are the rules I use:
1) Every word in the Master transcript that is missed counts as one error. So ‘a reed where’ for ‘everywhere’ is just one error, but ‘everywhere’ for ‘every hair’ is two errors. 2) ah, uh, um are ignored. Some ASRs include them, some don’t. I’ll let ‘a’ go, but if an ‘uh’ should be ‘an’ it’s an error. 3) Commas are 1/2 error and full stops (period, ?) are also 1/2 error but there’s an argument for making them a full error. 4) If words are correct but the ASR tries to separate/merge them (e.g. ‘you’re’ to ‘you are’, ‘got to’ to ‘gotta’, ‘because’ to ’cause) it does not count as an error.
That’s it! We then add up the errors, divide that by the number of words that are in the clip, and that’s the error rate!
This is a quick blog post showing you how to use the free Transcriptive trial version to convert any SRT caption file into a text file without timecode or line numbers (which SRTs have). You can do this on Transcriptive.com or if you have Premiere, you can use Transcriptive for Premiere Pro.
This can occur because you have a caption file (SRT or VTT) but don’t have access to the original transcript. SRT files tend to look like this:
00:00:02,299 –> 00:00:09,100
The quick brown fox
00:00:09,100 –> 00:00:17,200
hit the gas pedal and
And you might want normal human readable text so someone can read the dialog, without the line numbers and timecode. So this post will show you how to do that with Transcriptive for free!
We are, of course, in the business of selling software. So we’d prefer you bought Transcriptive BUT if you’re just looking to convert an SRT (or any caption file) to a text file, the free trial does that well and you’re welcome to use it. (btw, we also have some free plugins for After Effects, Premiere, FCP, and Resolve HERE. We like selling stuff, but we also like making fun or useful free plugins)
Getting The Free Trial License
As mentioned, this works for the Premiere panel or Transcriptive.com, but I’ll be using screenshots from the panel. So if you’re using Transcriptive.com it may look a little bit different.
You do need to create a Transcriptive account, which is free. When the panel first pops up, click the Trial button to start the registration process:
You then need to create your account, if you don’t have one. (If you’re using Transcriptive.com, this will look different. You’ll need to manually select the ‘free’ account option.)
Importing the SRT
Once you register the free trial license, you’ll need to import the SRT. If you’re on Transcriptive.com, you’ll need to upload something (could be 10sec of black video, doesn’t matter what, but there has to be some media). If you’re in Premiere, you’ll need to create a Sequence first, make sure Clip Mode is Off (see below) and then you can click IMPORT.
Once you click Import, you can select SRT from the dropdown. You’ll need to select the SRT file using the file browser (click the circled area below). Then click the Import button at the bottom.
You can ignore all the other options in the SRT Import Window. Since you’re going to be converting this to a plain text file without timecode, none of the other stuff matters.
After clicking Import, the Transcriptive panel will look something like this. The text from the SRT file along with all the timecode, speakers, etc:
Exporting The Plain Text File
Alright… so how do we extract just the text? Easy! Click the Export button in the lower, left corner. In the dialog that gets displayed, select Plain Text:
The important thing here is to turn OFF ‘Display Timecode’ and ‘Include Speakers’. This will strip out any extra data that’s in the SRT and leave you with just the text. (After you hit the Export button)
Ok, well, since caption files tend to have lines that are 32 characters long you might have a text file that looks like this:
The quick brown fox
hit the gas pedal and
If you want that to look normal, you’ll need to bring it into Word or something and replace the Paragraphs with a Space like this:
And that will give you:
The quick brown fox hit the gas pedal and
And now you have human readable text from an SRT file! A few steps, but pretty easy. Obviously there are lots of other things you can do with SRTs in Transcriptive, but converting the SRT to a plain text file is one that can be done with the free trial. As mentioned, this also works with VTT files as well.
So grab the free trial of Transcriptive here and you can do it yourself! You can also request an unrestricted trial by emailing email@example.com. While this SRT to Plain Text functionality works fine, there are some other limitations if you’re testing out the plugins for transcripts or editing the text.
Any time I hear people freaking out about A.I., for good or bad, I’m skeptical. So much that happens in the world of A.I. is hype that the technology may never live up to, much less live up to it now, that you have to take it with a grain of salt.
This is an awesome demo reel for the VFX artist involved. It’s very well done.
But it doesn’t make an awesome case for the technology disrupting the world as we know it. It needed raw footage of someone that looked and acted like Cruise. It then took months to clean up the results of the A.I. modifying the Cruise look-a-like.
(It does make an interesting case for the tech to be used in VFX, but that’s something different)
This isn’t a ‘one-click’ or even a ‘whole-bunch-of-clicks’ technology. It’s a ‘shit-ton-of-work’ technology and given they had the footage of the Cruise look-a-like you can make an argument that it could’ve been done in less time with traditional rotoscoping and compositing.
Anyways, the fear and consternation has gotten ahead of itself. We’ve had the ability to put people in photos they weren’t in for a long time. We’ve figured out how to deal with it. (not to mention, it’s STILL very difficult to cut someone out of one picture and put them in another one without it being detectable… and Photoshop is, what, 30 years old now?)
It’s good to consider the implications but we’re a long, long way from anyone being able to do this to any video.
We occasionally get questions from customers asking why we charge .04/min ($2.40/hr) for transcription (if you pre-pay), when some competitors charge .25/min or even .50/min. Is it lower accuracy? Are you selling our data?
No and no. Ok, but why?
Transcriptive and PowerSearch work best when all your media has transcripts attached to it. Our goal is to make Transcriptive as useful as possible. We hope the less you have to think about the cost of the transcripts, the more media you’ll transcribe… resulting in making Transcriptive and PowerSearch that much more powerful.
The Transcriptive-AI service is equal to, or better, than what other services are using. We’re not tied to one A.I. and we’re constantly evaluating the different A.I. services. We use whatever we think is currently state-of-the-art. Since we do such a high volume we get good pricing from all the services, so it doesn’t really matter which one we use.
Do we make a ton of money on transcribing? No.
The services that charge .25/min (or whatever) are probably making a fair amount of money on transcribing. We’re all paying about .02/min or less. Give or take, that’s the wholesale/volume price.
If you’re getting your transcripts for free… those transcripts are probably being used for training, especially if the service is keeping track of the edits you make (e.g. YouTube, Otter, etc.). Transcriptive is not sending your edits back to the A.I. service. That’s the important bit if you’re going to train the A.I. Without the corrected version, the A.I. doesn’t know what it got wrong and can’t learn from it.
So, for us, it all comes down to making Transcriptive.com, the Transcriptive Premiere Pro panel, and PowerSearch as useful as possible. To do so, we want the most accurate transcripts and we want them to be as low cost as possible. We know y’all have a LOT of footage. We’d rather reduce the barriers to you transcribing all of it.
We often get asked what the differences are between Transcriptive 2.0 and 1.0. So here is the full list of new features! As always there are a lot of other bug fixes and behind the scenes changes that aren’t going to be apparent to our customers. So this is just a list of features you’ll encounter while using Transcriptive.
NEW FEATURES IN TRANSCRIPTIVE 2.0
Works with clips or sequences: You no longer have to have clips in sequences to get them transcribed. Clips can be transcribed and edited just by selecting them in the Project panel. This opens up many different workflows and is something the new caption system in Premiere can’t do. Watch the tutorial on transcribing clips in Premiere
A clip selected in the Project panel. Setting In/Out points in TS!
Editing with Text: Clip Mode enables you to search through clips to find sound bites. You can then set IN/OUT points in the transcript and insert them into your edit. This is a powerful way of compiling rough cuts without having to scrub through footage. Watch the Tutorial on editing video using a transcript!
Collaborate by Sharing/Send/receive to Transcriptive.com: Collaborate on creating a paper edit by sharing the transcript with your team and editor. Send transcripts or videos from Premiere to Transcriptive.com, letting a client, AE, or producer edit them in a web browser or add Comments or strike-through text. The transcript can then be sent back to the video editor in Premiere to continue working with it. Watch the tutorial on collaborating in Premiere using Transcriptive.com! There’s also this blog post on collaborative workflows.
Now includes PowerSearch for free! Transcriptive can only search one transcript at a time. With PowerSearch, you can search every clip and sequence in your project! It’s a search engine for Premiere. Search for text and get search results like Google. Click on a result and it jumps to exactly where the dialog is in that clip or sequence. Watch the tutorials on PowerSearch, the search engine for Premiere.
Reduced cost: As low as .04/min. by prepaying minutes you can get the cost down to .04/min! Why is it so inexpensive? Is it worse than the other services that charge .25 or .50/min? No! We’re just as good or better (don’t take my word, run your own comparisons). Transcriptive only works if you’ve transcribed your footage. By keeping the cost of minutes low, hopefully we make it an easy decision to transcribe all your footage and make Transcriptive as useful as possible!
Ability to add comments/notes at any point in the transcript. The new Comments feature lets you add a note to any line of dialog. Incredibly useful if you’re working with someone else and need to share information. It’s also great if you want to make notes for yourself as you’re going through footage.
Strikethrough text: Allows you to strikethrough text to indicate dialog that should be removed. Of course, you can just delete it but if you’re working with someone and you want them to see what you’ve flagged for deletion OR if you’re just unsure if you want to definitely delete it, strikethrough is an excellent way of identifying that text.
More ‘word processor’ like text editor: A.I. isn’t perfect, even though it’s pretty close in many cases (usually 96-99% accurate with good audio). However, you can correct any mistake you find with the new text editor! It’s quick and easy to use because it works just like a word processor built into Premiere. Watch the tutorial on editing text in Transcriptive!
Align English transcripts for free: If you already have a script, you can sync the text to your audio track at no cost. You’ll get all the benefits of the A.I. (per word timing, searchability, etc) without the cost. It’s a free way of making use of transcripts you already have. Watch the tutorial on syncing transcripts in Premiere!
Adjust timing for words: If you’re editing text and correcting any errors the A.I. might have made it can result in the new words having timecode that doesn’t quite sync with the spoken dialog. This new feature lets you adjust the timecode for any word so it’s precisely aligned with the spoken word.
Ability to save the transcript to any audio or video file: In TS 1.0 the transcript always got saved to the video file. Now you can save it to any file. This is very helpful if you’ve recorded the audio separately and want the transcript linked to that file.
More options for exporting markers: You can set the duration of markers and control what text appears in them.
Profanity filter: **** out words that might be a bit much for tender ears.
More speaker management options: Getting speaker names correct can be critical. There are now more options to control how this feature works.
Additional languages: Transcriptive now supports over 30 languages!
Checks for duplicate transcripts: Reduces the likelihood a clip/sequence will get transcribed twice unnecessarily. Sometimes users will accidentally transcribe the same clip twice. This helps prevent that and save you money!
Lock to prevent editing: This allows other people to view the transcript in Premiere or on Transcriptive.com and prevent them from accidentally making changes.
Sync Transcript to Sequence: Often you’ll get the transcript before you make any edits. As you start cutting and moving things around, the transcript will no longer match the edit. This is a one-click way of regenerating the transcript to match the edit.
Streamlined payment/account workflow: Access multiple speech engines with one account. Choose the one most accurate for your footage.
We get a fair number of questions from Transcriptive users that are concerned the A.I. is going to use their data for training.
First off, in the Transcriptive preferences, if you select ‘Delete transcription jobs from server’ your data is deleted immediately. This will delete everything from the A.I. service’s servers and from the Digital Anarchy servers. So that’s an easy way of making sure your data isn’t kept around and used for anything.
However, generally speaking, the A.I. services don’t get more accurate with user submitted data. Partially because they aren’t getting the ‘positive’ or corrected transcript.
When you edit your transcript we aren’t sending the corrections back to the A.I. (some services are doing this… e.g. if you correct YouTube’s captions, you’re training their A.I.)
So the audio by itself isn’t that useful. What the A.I. needs to learn is the audio file, the original transcript AND the corrected transcript. So even if you don’t have the preference checked, it’s unlikely your audio file will be used for training.
This is great if you’re concerned about security BUT it’s less great if you really WANT the A.I. to learn. For example, I don’t know how many videos I’ve submitted over the last 3 years saying ‘Digital Anarchy’. And still to this day I get: Dugal Accusatorial (seriously), Digital Ariki, and other weird stuff. A.I. is great when it works, but sometimes… it definitely does not work. And people want to put this into self-driving cars? Crazy talk right there.
If you want to help the A.I. out, you can use the Speech-to-Text Glossary (click the link for a tutorial). This still won’t train the A.I., but if the A.I. is uncertain about a word, it’ll help it select the right one.
How does the glossary work? The A.I. analyzes a word sound and then comes up with possible words for that sound. Each word gets a ‘confidence score’. The one with the highest score is the one you see in your transcript. In the case above, ‘Ariki’ might have had a confidence of .6 (out 0 to 1, so .6 is pretty low) and ‘Anarchy’ might have been .53. So my transcript showed Ariki. But if I’d put Anarchy into the Glossary, then the A.I. would have seen the low confidence score for Ariki and checked if the alternatives matched any glossary terms.
So the Glossary can be very useful with proper names and the like.
But, as mentioned, nothing you do in Transcriptive is training the A.I. The only thing we’re doing with your data is storing it and we’re not even doing that if you tell us not to.
It’s possible that we will add the option in the future to submit training data to help train the A.I. But that’ll be a specific feature and you’ll need to intentionally upload that data.
Since we announced the bundle between Transcriptive and PowerSearch a few months back, our team has been working even harder to improve the plugin so users can make the most of having transcripts and search engine capabilities inside Premiere Pro. This means we are releasing Transcriptive 2.0.5, which fixes some critical bugs reported, and PowerSearch 2.0: a much faster and efficient version of our metadata search tool.
Having accurate transcripts available in Premiere is already a big help on speeding up video production workflows, especially while working remotely. (See this previous post about Transcriptive’s sharing capabilities for remote collaboration!) But we truly believe, and have been hearing this from clients as well, that having all the content in your video editing project – especially transcripts! – converted into searchable metadata makes it much easier to find content if you have large amounts of footage, markers, sequences, and media files. And this is why the PowerSearch and Transcriptive combo makes it much easier tofind soundbites, different takes of a script, or pinpoint any time a name or place is mentioned.
PowerSearch 1.0 was decently fast but could be slow on larger projects. Our next release makes use of a powerful SQL database to make PowerSearch an order of magnitude faster. The key to PowerSearch is that it indexes an entire Premiere Pro project, much like Google indexes websites, to optimize search performance. An index of hundreds of videos that used to take 10-12 hours to create is now indexed in less than an hour and the same database makes searching all that data significantly faster. Another advantage is the ability to use common search symbols, such as minus signs and quotes, for more precise, accurate searching. For editors with hundreds of hours of video, this can help narrow down searches from hundreds of results to a few dozen.
PowerSearch still returns search results like any search engine. Showing you the search term, the words around it, what clip/sequence/marker it’s in, and the timecode. Clicking on the result will open the clip or sequence and jump straight to the correct timecode in the Source or Program panel.
PowerSearch 2.0 can still be purchased separately and help your production even if you are getting transcripts from a different source or just want to search markers. However, it is now bundled with Transcriptive and you can get both for $149 while PowerSearch costs $99 on its own. So if you haven’t tried using PowerSearch and Trabscriptive together, give it a try! We are constantly working on Transcriptive to add more capabilities, reduce transcription costs, and improve the sharing options now available in the panel. Features like Clip Mode and the new Text Editor go beyond just transcribing media and sequences, and combining it with a much faster PowerSearch makes finding content much faster.
Transcriptive 2.0 users can use their Transcriptive license to activate PowerSearch. Trial licenses for both Transcriptive and PowerSearch are available here and our team would be happy to help if you need support figuring out a workflow for you and your team. Send any questions, concerns, or feedback to firstname.lastname@example.org! We would love to hear from you.
This past two weeks Social Media channels have been flooded with video production crews sharing their remote editing stations and workflows. As everybody struggles to adapt and stay productive we’re hoping the Transcriptive web app, which has a new beta version, can help you with some of your challenges.
New Transcriptive.com Beta Version
We just updated https://app.transcriptive.com with a new version. It’s still in beta, so it’s still free to use. It’s a pretty huge upgrade from the previous beta. With a new text editor and sharing capabilities. Users can also upload a media file, transcribe, manage speakers, edit, search and export transcripts without having to access Premiere Pro.
But the real strength is the ability to collaborate and share transcripts with Premiere users and other web users.
How’s Transcriptive going to help keep everyone in sync when they’re working remotely?
The web app was designed from the beginning to help editors work remotely with clients or producers. Transcripts can be easily edited and shared between Premiere Pro and a web browser or between two web users.
This means producers, clients, assistant editors, and interns can easily review and clean up transcripts on the web and send them to the Premiere editor. They can also identify the timecode of video segments that are important or have problems. All this can be done in a web browser and then shared.
If you are a video editor and have been transcribing in Premiere Pro, sending the transcripts and media to Transcriptive.com is quick and makes it easy for team members to access the footage and the transcribed text.
Premiere To A Web Browser
Click on the [ t ] menu in Premiere Pro, link to a web project, and then you can upload the transcript, video, or both. Team members can then log into the Transcriptive.com account and view it all!
Web users are able to edit the transcripts, watch the uploaded clips, see the timecode on the transcript, export the transcript as a Word Document, plain text, captions, and subtitle files, etc. Other features like adding comments or highlighting text are coming soon.
From The Web To Premiere
Once web user is done editing or reviewing the transcript, the editor can pull it back into Premiere. Again, go to the [ t ] menu, and select ‘Download From Your Web Project’. This will download the transcript from Transcriptive.com and load it for the linked video.
Web users can also transcribe videos they upload and share them with other web users. The transcripts can then be downloaded by an editor working in Premiere. Usually it’s a bit easier to start the video upload process from Premiere, but it is possible to do everything from Transcriptive.com.
It’s a powerful way of collaborating with remote users, letting you share videos, transcripts and timecode. Round tripping from Premiere to the web and back again, quickly and easily. Exactly what you need for keeping projects going right now.
Curious to try our BETA web App but still have questions on how it works? Send an email to email@example.com. And if you have tried the App we would love to hear your feedback!
If you’ve been using Speechmatics credits to transcribe in Transcriptive, our transcription plugin for Premiere Pro, then you noticed that accessing your credits in Transcriptive 2.0.2 and later is not an option anymore. Speechmatics is discontinuing the API that we used to support their service in Transcriptive, which means your Speechmatics credentials can no longer be validated inside of the Transcriptive panel.
We know a lot of users still have Speechmatics credits and have been working closely with Speechmatics so those credits can be available in your Transcriptive account as soon as possible. Hopefully in the next week or two.
In the meantime, there are a couple ways users can still transcribe with Speechmatics credits. 1) Use an older version of Transcriptive like v1.5.2 or v2.0.1. Those should still work for a bit longer but uses the older, less accurate API or 2) Upload directly on their website and export the transcript as a JSON file to be imported into Transcriptive. It is a fairly simple process and a great temporary solution for this. Here’s a step-by-step guide:
1. Head to the Speechmatics website – To use your Speechmatics credits, head to www.speechmatics.com and login to your account. Under “What do you want to do?”, choose “Transcription” and select the language of your file.
2. Upload your media file to the Speechmatics website – Speechmatics will give you the option to drag and drop or select your media from a folder on your computer. Choose whatever option works best for you and then click on “Upload”. After the file is uploaded, the transcription will start automatically and you can check the status of the transcription on your “Jobs” list. 3. Download a .JSON file – After the transcription is finished (refresh the page if the status doesn’t change automatically!), click on the Actions icon to access the transcript. You will then have the option to export the transcript as a .JSON file
4. Import the .JSON file into any version of Transcriptive – Open your Transcriptive panel in Premiere. If you are usingTranscriptive 2.0, be sure Clip Mode is turned on. Select the clip you have just transcribed on Speechmatics and click on “Import”. If you are using an older version of Transcriptive, drop the clip into a sequence before choosing “Import”.
You will then have the option to “Choose an Importer”. Select the JSON option and import the Speechmatics file saved on your computer. The transcript will be synced with the clip automatically at no additional charge.
One important thing to know is that, although Transcriptive v1.x still have Speechmatics as an option and it still works, we would still recommend following the steps above to transcribe with Speechmatics credits. The option available in these versions of the panel is an older version of their API and less accurate than the new version. So we recommend you transcribe on the Speechmatics website if you want to use your Speechmatics credits now and not wait for them to be transferred.
However, we should have the transfer sorted out very soon, so keep an eye open for an email about it if you have Speechmatics credits. If the email address you use for Speechmatics is different than the one you use for Transcriptive.com, please email firstname.lastname@example.org. We want to make sure we get things synced up so the credits go to the right place!
A lot of you have a ton of footage that you want to transcribe. One of our goals with Transcriptive has been to enable you to transcribe everything that goes into your Premiere project. To search it, to create captions, to easily see what talent is saying, etc. But if you’ve got 100 hours of footage, even at $0.12/min the costs can add up. So…
Transcriptive has a new feature that will help you cut your transcribing costs by 50%. The latest version of our Premiere Pro transcription plugin has already cut the costs of transcribing from $0.012 to $0.08. However, our new prepaid minutes’ packages goes even further… allowing users to purchase transcribing credits in bulk! You can save 50% per minute, transcribing for $2.40/hr or .04/min. This applies to both Transcriptive AI or Speechmatics.
The pre-paid minutes option will reduce transcription costs to $0.04/min which can be purchased in volume for $150 or $500. For small companies and independent editors, the $150 package will make it possible to secure 62.5 hours of transcription without breaking the bank. If you and your team are transcribing large amounts of footage, going for the $500 will allow you to save even more.
The credits are good for 24 months, so you don’t need to worry about them expiring.
You don’t HAVE to pre-pay. You can still Pay-As-You-Go for $0.08/min. That’s still really inexpensive for transcription and if you’re happy with that, we’re happy with it too.
However, if you’re transcribing a lot of footage, pre-paying is a great way of getting costs down. It also has other benefits, you don’t need to share your credit card with co-workers and other team members. For bigger companies, production managers, directors or even an account department can be in charge of purchasing the minutes and feeding credits into the Premiere Pro Transcriptive panel so editors no longer have to worry about the charges submitted to the account holder’s credit card.
Buying the minutes in advance is simple! Go to your Premiere Pro panel, click on your profile icon, choose “Pre-Pay Minutes” and select the option that better suits your needs. You can also pre-pay credits from your web app account by logging into app.transcriptive.com, opening your “Dashboard” and clicking on “Buy Minutes”. A pop-up window will ask you to choose the pre-paid minutes package and ask for the credit card information. Confirm the purchase and your prepaid minutes will show under “Balance” on your homepage. The prepaid minutes’ balance will also be visible in your Premiere Pro panel, right next to the cost of the transcription.
Applying purchased credits to your transcription jobs is also a quick and easy process. While submitting a clip or sequence for transcription, Transcriptive will automatically deduct the amount required to transcribe the job from your balance. If the available credit is not enough to transcribe your job, the remaining minutes will be charged to the credit card on file.
The 50% discount on prepaid minutes will only apply to transcribing, but minutes can be used to Align existing transcripts at regular cost. English transcripts can be imported into Transcriptive and aligned to your clips or sequences for free, while text in other languages will align for $0.02/min with Transcriptive AI and $0.04/min with Transcriptive Speechmatics.
When A.I. works, it can be amazing. BUT you can waste a lot of time and money when it doesn’t work. Garbage in, garbage out, as they say. But what is ‘garbage’ and how do you know it’s garbage? That’s one of the things, hopefully, I’ll help answer.
Why Even Bother?
It’s a bit tedious to do the testing, but being able to identify the most accurate service will save you a lot of time in the long run. Cleaning up inaccurate transcripts, metadata, or keywords is far more tedious and problematic than doing a little testing up front. So it really is time well spent.
One caveat… There’s a lot of potential ways to use A.I., and this is only going to cover Speech-to-Text because that’s what I’m most familiar with due to Transcriptive and getting A.I. transcripts in Premiere. But if you understand how to evaluate one use, you should, more or less, be able to apply your evaluation method to others. (i.e. for testing audio, you want varying audio quality among your samples. If testing images you want varying quality (low light, blurriness, etc) among your samples)
At Digital Anarchy, we’re constantly evaluating a basket of A.I. services to determine what to use on the backend of Transcriptive. So we’ve had to come up with a methodology to fairly test how accurate they are. Most of the people reading this are in a bit different situation… testing solutions from various vendors that use A.I. instead of testing the A.I. directly. However, since different vendors use different A.I. services, this methodology will still be useful for you in comparing the accuracy of the A.I. at the core of the solutions. There may be, of course, other features of a given solution that may affect your decision to go with one or the other, but at least you’ll be able to compare accuracy objectively.
Here’s an outline of our method:
Always use new files that haven’t been processed before by any of the A.I. services.
Keep them short. (1-2min)
Choose files of varying quality.
Use a human transcription service to create the ‘test master’ transcript.
Have someone do a second pass to correct any human errors.
Create a set of rules on word/punctuation errors for what counts as an error (or 1/2 or two).
If you change them halfway through the test, you need to re-test everything.
Apply them consistently. If something is ambiguous, create a rule for how it will be handled and alway apply it that way.
Compare the results and may the best bot win.
May The Best Bot Win : Visualizing
The main chart compares each engine on a specific file (i.e. File #1, File # 2, etc), using both word and punctuation accuracy. This is really what we use to determine which is best, as punctuation matters. It also shows where each A.I. has strengths and weaknesses. The second, smaller chart shows each service from best result to worst result, using only word accuracy. Every A.I. will eventually fall off a cliff in terms of accuracy. This chart shows you the ‘profile’ for each service and can be a little bit clearer way of seeing which is best overall, ignoring specific files.
First it’s important to understand how A.I. works. Machine Learning is used to ‘train’ an algorithm. Usually millions of bits of data that have been labeled by humans are used to train it. In the case of Speech-to-Text, these bits are audio files with a human transcripts. This allows the A.I. to identify which audio waveforms, the word sounds, go with which bits of text. Once the algorithm has been trained, we can then send audio files to the algorithms and it makes it’s best guess as to which word every waveform corresponds to.
A.I. algorithms are very sensitive to what they’ve been trained on. The further you get away from what they’ve been trained on, the more inaccurate they are. For example, you can’t use an English A.I. to transcribe Spanish. Likewise, if an A.I. has been trained on perfectly recorded audio with no background noise, as soon as you add in background noise it goes off the rails. In fact, the accuracy of every A.I. eventually falls off a cliff. At that point it’s more work to clean it up than to just transcribe it manually.
Always Use New Files
Any time you submit a file to an A.I. it’s possible that the A.I. learns from that file. So you really don’t want to use the same file over and over and over again. To ensure you’re getting unbiased results it’s best to use new files every time you test.
Keep The Test Files Short
First off, comparing transcripts is tedious. Short transcripts are better than long ones. Secondly, if the two minutes you select is representative of an hour long clip, that’s all you need. Transcribing and comparing the entire hour won’t tell you anything more about the accuracy. The accuracy of two minutes is usually the same as the accuracy of the hour.
Of course, if you’re interviewing many different people over that hour in different locations, with different audio quality (lots of background noise, no background noise, some with accents, etc)… two minutes won’t be representative of the entire hour.
Chose Files of Varying Quality
This is critical! You have to choose files that are representative of the files you’ll be transcribing. Test files with different levels of background noise, different speakers, different accents, different jargon… whatever issues usually occur in the dialog typically in your videos. ** This is how you’ll determine what ‘garbage’ means to the A.I. **
Use Human Transcripts for The ‘Test Master’
Send out the files to get transcribed by a person. And then have someone within your org (or you) go over them for errors. There usually are some, especially when it comes to jargon or names (turns out humans aren’t perfect either! I know… shocker.). These transcripts will be the what you compare the A.I. transcripts against, so they need to be close to perfect.If you change something after you start testing, you need to re-test the transcripts you’ve already tested.
Create A Set of Rules And Apply Them Consistently
You need to figure out what you consider one error, a 1/2 error or two errors. In most cases it doesn’t matter exactly what you decide to do, only that you do it consistently. If a missing comma is 1/2 an error, great! But it ALWAYS has to be a 1/2 error. You can’t suddenly make it a full error just because you think it’s particularly egregious. You want to remove judgement out of the equation as much as possible. If you’re making judgement calls, it’s likely you’ll choose the A.I. that most resembles how you see the world. That may not be the best A.I. for your customers. (OMG… they used an Oxford Comma! I hate Oxford commas! That’s at least TWO errors!).
And NOW… The Moment You’ve ALL Been Waiting For…
Add up the errors, divide that by the number of words, put everything into a spreadsheet… and you’ve got your winner!
It’s a bit tedious to do the testing, but being able to identify the most accurate service will save you a lot of cleanup time in the long run. So it really is time well spent.
Hopefully this post has given you some insights into how to test whatever type of A.I. services you’re looking into using. And, of course, if you haven’t checked out Transcriptive, our A.I. transcript plugin for Premiere Pro, you need to!Thanks for reading and please feel free to ask questions in the comment section below!
Have you ever considered using Transcriptive to build an effective Search Engine Optimization (SEO) strategy and increase the reach of your Social Media videos? Having your footage transcribed right after the shooting can help you quickly scan everything for soundbites that will work for instant social media posts. You can find the terms your audience searches for the most, identify high ranked keywords in your footage, and shape the content of your video based on your audience’s behavior.
According to vlogger and Social Media influencer Jack Blake, being aware of what your audience is doing online is a powerful tool to choose when and where to post your content, but also to decide what exactly to include in your Social Media Videos, which tend to be short and soundbite-like. The content of your media, titles, video descriptions and thumbnails, tags and post mentions should all be part of a strategy built based on what your audience is searching for. And this is why Blake is using Transcriptive not only to save time on editing but also to carefully curate his video content and attract new viewers.
Right after shooting his videos, the vlogger transcribes everything and exports the transcripts as rich text so he can quickly share the content with his team. After that, a Copywriter scans through the transcribed audio and identifies content that will bring traffic to the client’s website and increase ROI. “It’s amazing. I transcribe the audio in minutes, edit some small mistakes without having to leave Premiere Pro, and share the content with my team. After that, we can compare the content with our targeted keywords and choose what I should cut. The editing goes quickly and smoothly because the words are already time-stamped and my captions take no time to create. I just export the transcripts as an SRT and it is pretty much done, explains Blake.
Of course, it all starts with targeting the right keywords and that can be tricky, but there are many analytics and measurement applications offering this service nowadays. If you are just getting started in the whole keyword targeting game, the easiest and most accessible way is connecting your In-site Search queries with Google Analytics. This will allow you to get information on how users are interacting with your website, including how much your audience searches, who is performing searches and who is not, and where they begin searching, as well where they head to afterward. Google Analytics will also allow you to find out what exactly people are typing into Google when searching for content on the web.
For Blake, using competitors’ hashtags from Youtube has been very helpful to increase video views. “One of the differentials in my work is that I research my client’s competitors on Youtube and identify the VidIQs (Youtube keyword tags) they have been using on their videos so we can use competitive tagging in our content description and video title. This allows the content I produced for the client to show when people search for this specific hashtag on Youtube,” he explains. Blake’s team is also using Google Trends, a website that analyzes the popularity of top search queries in Google Search across various regions and languages. It’s a great tool to find out how often a search term is entered in Google’s search engine, compare it to their total search volume, and learn how search trends varied within a certain interval of time.
When asked what would be the last thing he would recommend to video makers wanting to boost their video views on Social Media, Blake had no hesitation in choosing captions. “Social media feeds are often very crowded, fast-moving, and competitive. Nobody has time to open the video as full screen, turn the sound on and watch the whole thing, they often watch the videos without sound, and if the captions are not there then your message will not get through. And Transcriptive makes captioning a very easy process,” he says.
The struggle of making documentary films nowadays is real. Competition is high, and budget limitations can stretch a 6-year deadline to a 10 year-long production. To make a movie you need money. To get the money you need decent, and sometimes edited, footage material to show to funding organizations and production companies. And decent footage, well-recorded audio, as well as edited pieces cost money to produce. I’ve been facing this problem myself and discovered through my work at Digital Anarchy that finding an automated tool to transcribe footage can be instrumental in making small and low budget documentary films happen.
In this interview, I talked to filmmaker Chuck Barbee to learn how Transcriptive is helping him to edit faster and discussed some tips on how to get started with the plugin. Barbee has been in the Film and TV business for over 50 years. In 2005, after an impressive career in the commercial side of the Film and TV business, he moved to California’s Southern Sierras and began producing a series of personal “passion” documentary films. His projects are very heavy on interviews, and the transcribing process he used all throughout his career was no longer effective to manage his productions.
Barbee has been using Transcriptive for a month, but already consider the plugin a game-changer. Read on to learn how he is using the plugin to makea long-form documentary about the people who created what is known as “The Bakersfield Sound” in country music.
DA: You have worked in a wide variety of productions throughout your career. Besides co-producing, directing, and editing prime-time network specials and series for the Lee Mendelson Productions, you also worked as Director of Photography for several independent feature films. In your opinion. How important is the use of transcripts in the editing process?
CB: Transcripts are essential to edit long-form productions because they allow producers, editors, and directors to go through the footage, get familiarized with the content, and choose the best bits of footage as a team. Although interview oriented pieces are more dependent on transcribed content, I truly believe transcripts are helpful no matter what type of motion picture productions you are making.
On most of my projects, we always made cassette tape copies of the interviews, then had someone manually transcribe them and print hard copies. With film projects, there was never any way to have a time reference in the transcripts, unless you wanted to do that manually. Then in the video, it was easier to make time-coded transcripts, but both of these methods were time-consuming and relatively expensive labor wise. This is the method I’ve used since the late ’60s, but the sheer volume of interviews on my current projects and the awareness that something better probably exists with today’s technology prompted me to start looking for automated transcription solutions. That’s when I found Transcriptive.
DA: And what changed now that you are using Artificial Intelligence to transcribe your filmed interviews in Premiere Pro?
CB: I think Transcriptive is a wonderful piece of software. Of course, it is only as good as the diction of the speaker and the clarity of the recording, but the way the whole system works is perfect. I place an interview on the editing timeline, click transcribe and in about 1/3 of the time of the interview I have a digital file of the transcription, with time code references. We can then go through it, highlighting sections we want, or print a hard copy and do the same thing. Then we can open the digital version of the file in Premiere, scroll to the sections that have been highlighted, either in the digital file or the hard copy, click on a word or phrase and then immediately be at that place in the interview. It is a huge time saver and a game-changer.
The workflow has been simplified quite a bit, the transcription costs are down, and the editing process has sped up because we can search and highlight content inside of Premiere or use the transcripts to make paper copies. Our producers prefer to work from a paper copy of the interviews, so we use that TXT or RTF file to make a hard copy. However, Transcriptive can also help to reduce the number of printed materials if a team wants to do all the work digitally, which can be very effective.
DA: What makes you choose between highlighting content in the panel and using printed transcripts? Are there situations where one option works better than the other?
CB: It really depends on producer/editor choices. Some producers might want to have a hard copy because they would prefer that to work on a computer. It really doesn’t matter much from an editor’s point of view because it is no problem to scroll through the text in Transcriptive to find the spots that have been highlighted on the hard copy. All you have to do is look at the timecode next to the highlighted parts of a hard copy and then scroll to that spot in Transcriptive. Highlighting in Transcriptive means you are tying up a workstation, with Premiere, to do that. If you only have one editing workstation running Premiere, then it makes more sense to have someone do the highlighting with a printed hard copy or on a laptop or any other computer which isn’t running Premiere.
DA: You mentioned the AI transcription is not perfect, but you would still prefer that than paying for human transcripts or transcribing the interviews yourself. Why do you think the automated transcripts are a better solution for your projects?
CB:Transcriptive is amazing accurate, but it is also quite “literal” and will transcribe what it hears. For example, if someone named “Artie” pronounces his name “RD”, that’s what you’ll get. Also, many of our subjects have moderate to heavy accents and that does affect accuracy. Another thing I have noticed is that, when there is a clear difference between the sound of the subject and the interviewer, Transcriptive separates them quite nicely. However, when they sound alike, it can confuse them. When multiple voices speak simultaneously, Transcriptive also has trouble, but so would a human.
My team needs very accurate transcripts because we want to be able to search through 70 or more transcripts, looking for keywords that are important. Still, we don’t find the transcription mistakes to be a problem. Even if you have to go through the interview when it comes back to make corrections, It is far simpler and faster than the manual method and cheaper than the human option. Here’s what we do: right after the transcripts are processed, we go through each transcript with the interviews playing along in sync, making corrections to spelling or phrasing or whatever, especially with keywords such as names of people, places, themes, etc. It doesn’t take too much time and my tip is that you do it right after the transcripts are back, while you are watching the footage to become familiar with the content.
DA: Many companies are afraid of incorporating Transcriptive into an on-going project workflow. How was the process of using our transcription plugin in a long-form documentary film right away?
CB: We have about 70 interviews of anywhere from 30 minutes to one hour each. It is a low budget project, being done by a non-profit called “Citizens Preserving History“.The producers were originally going to try to use time-code-window DVD copies of the interviews to make notes about which parts of the interviews to use because of budget limitations. They thought the cost of doing manually typed transcriptions was too much. But as they got into the process they began to see that typed transcripts were going to be the only way to go. Once we learned about Transcriptive and installed it, it only took a couple of days to do all 70 interviews and the cost, at 12 cents per minute is small, compared to manual methods.
Transcriptive is very easy to use and It honestly took almost no time for me to figure out the workflow. The downloading and installation process was simple and direct and the tech support at Digital Anarchy is awesome. I’ve had several technical questions and my phone calls and emails have been answered promptly, by cheerful, knowledgeable people who speak my language clearly and really know what they are doing. They can certainly help quickly if people feel lost or something goes wrong so I would say do yourself a favor and use Transcriptive in your project!
Here’s a short version of the opening tease for “The Town That Wouldn’t Die”, Episode III of Barbee’s documentary series:
There are plenty of horrible things A.I. might be able to do in the future. And this MIT article lists six potential problem areas in the very near future, which are legit to varying degrees. (Although, this is more a list of humans behaving badly than A.I. per se)
However, most people don’t realize exactly how rudimentary (i.e. dumb) A.I. is in it’s current state. This is part of the problem with the MIT list. The technology is prone to biases, many false positives, difficulty with simple situations, etc., etc. The problem is more humans trying to make use of and/or make critical decisions based on immature technology.
For those of us that work with it regularly, we see all the limitations on a daily basis, so the idea of A.I. taking over the world is a bit laughable. In fact, you can see it daily yourself on your phone.
Take the auto-suggest feature on the iPhone. You would think the Natural Language Processing could take a phrase like ‘Glad you’re feeling b…’ and suggest things like better, beautiful or whatever. Not so hard, right?
How often does ‘glad’, ‘feeling’ and ‘bad’ appear in the same sentence? And you want to let A.I. drive your car?
We’ve got a ways to go.
Unless, of course, it’s a human problem again and there are a bunch of asshats out there that are glad you’re feeling bad. Oh, wait… it’s the internet. Right.
So you’ve uploaded your video to Facebook or YouTube and you’d like to import the captions they automatically generate with Artificial Intelligence into Transcriptive. This can be a good, FREE way of getting a transcript.
Transcriptive imports SRT files, so… all you need is an SRT file from those services. That’s easy peasy with YouTube, you just go to the Captions section and download>SRT.
Download the SRT and you’re done. Import the SRT into Transcriptive with ‘Combine Lines into Paragraphs’ turned on… Easy, free transcription.
With Facebook it’s more difficult as they don’t let you just download an SRT file. Or any file for that matter. So you need to get tricky.
Open Facebook in Firefox and go to the Web Developer>Network. This will open the inspector at the bottom of you browser window.
Which will give you something that looks like this:
Go to the Facebook video you want to get the caption file for.
Once the video starts playing, type SRT into the Filter field (as shown above)
This _should_ show an XHR file. (we’ve seen instances where it doesn’t, not sure why. So this might not work for every video)
Right Click on it, select Copy>Copy URL (as shown above)
Open a new Tab and paste in the URL.
You should now be asked to download a file. Save this as an SRT file (e.g. MyVideo.srt).
Import the SRT into Transcriptive with ‘Combine Lines into Paragraphs’ turned on… Easy, free transcription.
So that’s it. This worked as of this writing. It’s entirely possible Facebook will make a change at some point preventing this, but for now, it’s a good way of getting free transcriptions.
You can also do this in other browsers, I’m just using Firefox as an example.
We’ve released PowerSearch 1.0 for Premiere Pro! It’s a new part of the Transcriptive suite of tools that’s essentially a search engine for Premiere letting you search clips, sequences, markers, metadata and captions all in one place.
It streamlines your editing by allowing you to quickly search hours of video for words or phrases. While it works best when used in conjunction with Transcriptive, it plays well with any service that can get transcripts or SRTs (captions) into Premiere Pro. It’s all about helping you find data, we don’t care where the data comes from.
Like any search engine, it displays a list of results . In most cases, clicking on the result takes you to the exact moment the words were spoken in either the Source panel (clips) or the Timeline panel (sequences). If you’ve ever been asked to find a 15 second quote and had to dig through 50 hours of footage to find it, you know how valuable of a time saving tool this is.
I decided to try Transcriptive way before I became part of the Digital Anarchy family. Just like any other aspiring documentary filmmaker, I knew relying on a crew to get my editing started was not an option. Without funding you can’t pay a crew; without a crew you can’t get funding. I had no money, an idea in my head, some footage shot with the help of friends, and a lot of work to do. Especially when working on your very first feature film.
Besides being an independent Filmmaker and Social Media strategist for DA, I am also an Assistive Technology Trainer for a private company called Adaptive Technology Services. I teach blind and low vision individuals how to take advantage of technology to use their phones and computers to rejoin the workforce after their vision loss. Since the beginning of my journey as an AT Trainer – I started as a volunteer 6 years ago – I have been using my work to research the subject and prepare for this film.
My movie is about the relationship between the sighted and non-sighted communities. It seeks to establish a dialog between people with and without visual disabilities so we can come together to demystify disabilities to those without them. I know it is an important subject, but right from the beginning of this project I learned how hard it is to gather funds for any disability-related initiative. I had to carefully budget the shoots and define priorities. Paying a post-production crew was not (and still is not) possible. I have to write and cut samples on my own for now. Transcriptive was a way for me to get things moving by myself so I can apply for grants in the near future and start paying producers, editors, camera operators, sound designers, and get the project going for real. The journey started with transcribing the interviews. Transcriptive did a pretty good job with transcribe the audio from the camera as you can see below. Accuracy got even better when transcribing audio from the mic.
The idea of getting accurate automated transcripts brought a smile to my face. But could Artificial Intelligence really get the job done for me? I never believed so, and I was right. The accuracy for English interviews was pretty impressive. I barely had to do any editing on those. The situation changed as soon as I tried transcribing audio in my native language, Brazilian Portuguese. The AI transcription didn’t just get a bit flaky; it was completely unusable so I decided to do not waste more time and start doing my manual transcriptions.
I have been using Speechmatics for most of my projects because the accuracy is considerably higher than Watson with English. However, after trying to transcribe in Portuguese for the first time, it occurred to me Speechmatics actually offers Portuguese from Portugal while Watson transcribes Portuguese from Brazil. I decided to give Watson a try, but the transcription was not much better than the one I got from Speechmatics.
It is true the Brazilian Portuguese footage I was transcribing was b-roll clips recorded with a Rhode mic; placed on top of my DSLR. They were not well mic’d sit down interviews. The clips do have decent audio, but also involve some background noise that does not help foreign language speech-to-text conversion. At the time I had a deadline to match and was not able to record better audio and compare Speechmatics and Watson Portuguese transcripts. Will be interesting to give it another try, with more time to further compare and evaluate if there are advantages on using Watson for my next batch of footage.
Days after my failed attempt to transcribe Brazilian Portuguese with Speechmatics, I went back to the Transcriptive panel for Premiere, found an option to import my human transcripts, gave it a try, and realized I could still use Transcriptive to speed up my video production workflow. I could still save time by letting Transcriptive assign timecode to the words I transcribed, which would be nearly impossible for me to do on my own. The plugin allowed me to quickly find where things were said in 8 hours of interviews. Having the timecode assigned to each word allowed me to easily search the transcript and jump to that point in my video where I wanted to have a cut, marker, b-roll or transition effect applied.
My movie is still in pre-production and my Premiere project is honestly not that organized yet so the search capability was also a huge advantage. I have been working on samples to apply for grants, which means I have tons of different sequences, multicam sequences, markers that now live in folders inside of folders. Before I started working for DA I was looking for a solution to minimize the mess without having to fully organize it or spend too much money and Power Search came to the rescue. Also, being able to edit my transcripts inside of Premiere made my life a lot easier.
Last month, talking to a few film clients and friends, I found out most filmmakers still clean up human transcripts. In my case, I go through the transcripts to add punctuation marks and other things that will remind me how eloquent speakers were in that phrase. Ellipses, question marks and exclamation points remind me of the tone they spoke allowing me to get paper cuts done faster. I am not sure ASR technology will start entering punctuation in the future, but it would be very handy to me. While this is not a possibility, I am grateful Transcriptive now offers a text edit interface, so I can edit my transcripts without leaving Premiere.
For the movie I am making now I was lucky enough to have a friend willing to help me getting this tedious and time-consuming part of the work done so I am now exporting all my transcripts to Transcriptive.com. The app will allow us to collaborate on the transcript. She will be helping me all the way from LA, editing all the Transcripts without having to download a whole Premiere project to get the work done.
For the last 14 years I’ve created the Audio Art Tour for Burning Man. It’s kind of a docent led audio guide to the major art installations out there, similar to an audio guide you might get at a museum.
Burning Man always has a different ‘theme’ and this year it was ‘I, Robot’. I generally try and find background music related to the theme. EDM is big at Burning Man, land of 10,000 DJs, so I could’ve just grabbed some electronic tracks that sounded robotic. Easy enough to do. However I decided to let Artificial Intelligence algorithms create the music! (You can listen to the tour and hear the different tracks)
This turned out to be not so easy, so I’ll break down what I had to do to get seven unique sounding, usable tracks. I had a bit more success with AmperMusic, which is also currently free (unlike Jukedeck), so I’ll discuss that first.
Getting the Tracks
The problem with both services was getting unique sound tracks. The A.I. has a tendency of creating very similar sounding music. Even if you select different styles and instruments you often end up with oddly similar music. This problem is compounded by Amper’s inability to render more than about 30 seconds of music.
What I found I had to do was let it generate 30 seconds randomly or with me selecting the instruments. I did this repeatedly until I got a 30 second sample I liked. At which point I extended it out to about 3 or 4 minutes and turned off all the instruments but two or three. Amper was usually able to render that out. Then I’d turn off those instruments and turn back on another three. Then render that. Rinse, repeat until you’ve rendered all the instruments.
Now you’ve got a bunch of individual tracks that you can combine to get your final music track. Combine them in Audition or even Premiere Pro (or FCP or whatever NLE) and you’re good to go. I used that technique to get five of the tracks.
Jukedeck didn’t have the rendering problem but it REALLY suffered from the ‘sameness’ problem. It was tough getting something that really sounded unique. However, I did get a couple good tracks out of it.
Problems Using Artificial Intelligence
This is another example of A.I. and Machine Learning that works… sort of. I could have found seven stock music tracks that I like much faster (this is what I usually do for the Audio Art Tour). The amount of time it took me messing around with these services was significant. Also, if Jukedeck is any indication, a music track from one of these services will cost as much as a stock music track. Just go to Pond5 to see what you can get for the same price. With a much, much wider variety. I don’t think living, breathing musicians have much to worry about. At least for now.
That said, I did manage to get seven unique, cool sounding tracks out of them. It took some work, but it did happen.
As with most A.I./ML, it’s difficult to see what the future looks like. There has certainly been a ton of advances, but I think in a lot of cases, it’s some of the low hanging fruit. We’re seeing that with Speech-to-text algorithms in Transcriptive where they’re starting to plateau and cluster around the same accuracy levels. The fruit (accuracy) is now pretty high up and improvement are tough. It’ll be interesting to see what it takes to break through that. More data? Faster servers? A new approach?
I think music may be similar. It seems like it’s a natural thing for A.I. but it’s deceptively difficult to do in a way that mimics the range and diversity of styles and sounds that many human musicians have. Particularly a human armed with a synth that can reproduce an entire orchestra. We’ll see what it takes to get A.I. music out of the Valley of Sameness.
Premiere Pro CS6 has the ability to turn speech into text and put it into the Speech Analysis metadata. You can still use it in any version of Premiere Pro.
In Premiere CS6 you can right+click on a piece of footage and select ‘Analyze Content’. This would turn all the speech into text. Adobe removed it in later versions of Creative Cloud but all that infrastructure is still in Premiere Pro CC 2018 (and other versions) and this post will tell you how to make use of it with, and without, Transcriptive, our plugin for transcribing video.
First off, if you have Creative Cloud, you still have access to CS6 (or CC). You can download it and use that to turn all your speech to text. This will get saved with your file and when you import it into Premiere 2018, all the text will be in the Speech Analysis field of the Metadata panel. This is very handy as you can use the text with the Source panel to set in and out points and edit with text.
To get older versions of Premiere, go to the Creative Cloud app and find Premiere Pro. Click the menu button (or down arrow) and select ‘Other Versions’. You can install all the way back to CS6.
Once CS6 is installed, you can import the footage, right+click and select ‘Anaylze Content’. It takes some time to do this, but once it’s done, you’ll have all the speech turned into text in Speech Analysis. Import the clips into the version of Premiere you’re using and all that text will show up in the Metadata panel. Voila! It’s not an awesome interface for editing the text (and it needs a lot of editing as it’s not very accurate, which is why Adobe removed it) but it’s there.
Transcriptive can also use that data. If you’re using Transcriptive and drop the footage into a sequence, it’ll pull the text from the Speech Analysis field.
As mentioned, the CS6 speech-to-text isn’t very accurate, which you can see below. So it’s usually worth it to pay a few cents a minute to get a good A.I. transcript or $1.25/min to get human transcripts (which Transcriptive can import).
However, if you want free, then the CS6 trick is one way of doing it. Or you could use YouTube and import their captions into Transcriptive. It’s free, easy and we have a great tutorial that shows you how to get YouTube captions into Premiere!
1) Practically every company exhibiting was talking about A.I.-something.
2) VR seemed to have disappeared from vendor booths.
The last couple years at NAB, VR was everywhere. The Dell booth had a VR simulator, Intel had a VR simulator, booths had Oculuses galore and you could walk away with an armful of cardboard glasses… this year, not so much. Was it there? Sure, but it was hardly to be seen in booths. It felt like the year 3D died. There was a pavilion, there were sessions, but nobody on the show floor was making a big deal about it.
In contrast, it seemed like every vendor was trying to attach A.I. to their name, whether they had an A.I. product or not. Not to mention, Google, Amazon, Microsoft, IBM, Speechmatics and every other big vendor of A.I. cloud services having large booths touting how their A.I. was going to change video production forever.
I’ve talked before about the limitations of A.I. and I think a lot of what was talked about at NAB was really over promising what A.I. can do. We spent most of the six months after releasing Transcriptive 1.0 developing non-A.I. features to help make the A.I. portion of the product more useful. The release were announcing today and the next release coming later this month will focus on getting around A.I. transcripts completely by importing human transcripts.
There’s a lot of value in A.I. It’s an important part of Transcriptive and for a lot use cases it’s awesome. There are just also a lot of limitations. It’s pretty common that you run into the A.I. equivalent of the Uncanny Valley (a CG character that looks *almost* human but ends up looking unnatural and creepy), where A.I. gets you 95% of the way there but it’s more work than it’s worth to get the final 5%. It’s better to just not use it.
You just have to understand when that 95% makes your life dramatically easier and when it’s like running into a brick wall. Part of my goal, both as a product designer and just talking about it, is to help folks understand where that line in the A.I. sand is.
I also don’t buy into this idea that A.I. is on an exponential curve and it’s just going to get endlessly better, obeying Moore’s law like the speed of processors.
When we first launched Transcriptive, we felt it would replace transcriptionists. We’ve been disabused of that notion. ;-) The reality is that A.I. is making transcriptionists more efficient. Just as we’ve found Transcriptive to be making video editors more efficient. We had a lot of folks coming up to us at NAB this year telling us exactly that. (It was really nice to hear. :-)
However, much of the effectiveness of Transcriptive comes more from the tools that we’ve built around the A.I. portion of the product. Those tools can work with transcripts and metadata regardless of whether they’re A.I. or human generated. So while we’re going to continue to improve what you can do with A.I., we’re also supporting other workflows.
Over the next couple months you’re going to see a lot of announcements about Transcriptive. Our goal is to leverage the parts of A.I. that really work for video production by building tools and features that amplify those strengths, like PowerSearch our new panel for searching all the metadata in your Premiere project, and build bridges to other technology that works better in other areas, such as importing human created transcripts.
Should be a fun couple months, stay tuned! btw… if you’re interested in joining the PowerSearch beta, just email us at email@example.com.
Addendum: Just to be clear, in one way A.I. is definitely NOT VR. It’s actually useful. A.I. has a lot of potential to really change video production, it’s just a bit over-hyped right now. We, like some other companies, are trying to find the best way to incorporate it into our products because once that is figured out, it’s likely to make editors much more efficient and eliminate some tasks that are total drudgery. OTOH, VR is a parlor trick that, other than some very niche uses, is going to go the way of 3D TV and won’t change anything.
Chief Executive Anarchist
For all the developments in artificial intelligence, one of the consistently worst uses of it is with chatbots. Those little ‘Chat With Us’ side bars on many websites. Since we’re doing a lot with artificial intelligence (A.I.) in Transcriptive and in other areas, I’ve gotten very familiar with how it works and what the limitations are. It starts to be easy to spot where it’s being used, especially when it’s used badly.
So A.I. chatbots, which really doesn’t work well, have become a bit of a pet peeve of mine. If you’re thinking about using them for your website, you owe it to yourself to click around the web and see how often ‘chatting’ gets you a usable answer. It’s usually just frustrating. You go a few rounds with a cheery chatbot before getting to what you were going to do in the first place… send a message that will be replied to by a human. Total waste of time and doesn’t answer the questions.
Do you trust cheery, know-nothing chatbots with your customers?
The main problem is that chatbots don’t know when to quit. I get it that some business receive the same question over and over… where are you located? what are your hours? Ok, fine, have a chatbot act as a FAQ. But the chatbot needs to quickly hand off the conversation to a real person if the questions go beyond what you could have in an FAQ. And frankly, an FAQ would be better than trying to fake-out people with your A.I. chatbot. (honesty and authenticity matter, even on the web)
A.I. is just not great at reading comprehension. It can get the jist of things usually, which I think is useful for analytics and business intelligence. But this doesn’t allow it to respond with any degree of accuracy or intelligence. For responding to customer queries it produces answers that are sort of close… but mostly unusable. So, the result is frustrated customers.
Take a recent experience with Audi. I’m looking at buying a new car and am interested in one of their SUVs. I went onto an Audi dealer site to inquire about a used one they had. I wanted to know 1) was it actually in stock and 2) how much of the original warranty was left since it was a 2017? There was a button to send a message which I was originally going to use but decided to try the chat button that was bouncing up and down getting my attention.
So, I asked those questions in the chat. If it had been a real person, they definitely could have answered #1 and probably #2, even if they were just an assistant. But no, I ended in the same place I would’ve been if I’d just clicked ‘send a message’ in the first place. But first, I had to get through a bunch of generic answers that didn’t answer any of my questions and just dragged me around in circles. This is not a good way to deal with customers if you’re trying to sell them a $40,000 car.
And don’t get me started on Amazon’s chatbots. (and emailbots for that matter)
It’s also funny to notice how the chatbots try and make you think it’s human, with misspelled words and faux emotions. I’ve had a chatbot admonish me with ‘I’m a real person…’ when I called it a chatbot. It then followed that with another generic answer that didn’t address my question. The Pinocchio chatbot… You’re not a real boy, not a real person and you don’t get to pass Go and collect $200. (The real salesperson I eventually talked to confirmed it was a chatbot.)
I also had one threaten to end the chat if I didn’t watch my language, which was not aimed at the chatbot. I just said, “I just want this to f’ing work”. A little generic frustration. However, after it told me to watch my language, I went from frustrated to kind of pissed. So much for artificial intelligence having emotional intelligence. Getting faux-insulted over something almost any real human would recognize as low grade frustration, is not going to make customers happier.
I think A.I. has some amazing uses, Transcriptive makes great use of A.I. but it also has a LOT of shortcomings. All of those shortcomings are glaringly apparent when you look at chatbots. There are, of course, many companies trying to create conversational A.I. but so far the results have been pretty poor.
Based on what I’ve seen developing products with A.I., I think it’s likely it’ll be quite a while before conversational A.I. is a good experience on a regular basis. You should think very hard about entrusting your customers to it. A web form or FAQ is going to be better than a frustrating experience with a ‘sales person’.
Not sure what this has to do with video editing. Perhaps just another example of why A.I. is going to have a hard time editing anything that requires comprehending the content. Furthering my belief that A.I. isn’t going to replace most video editors any time soon.
A.I. is definitely changing how editors get transcripts and search video for content. Transcriptive demonstrates that pretty clearly with text. Searching via object recognition is something that also is already happening. But what about actual video editing?
One of the problems A.I. has is finishing. Going the last 10% if you will. For example, speech-to-text engines, at best, have an accuracy rate of about 95% or so. This is about on par with the average human transcriptionist. For general purpose recordings, human transcriptionists SHOULD be worried.
But for video editing, there are some differences, which are good news. First, and most importantly, errors tend to be cumulative. So if a computer is going to edit a video, at the very least, it needs to do the transcription and it needs to recognize the imagery. (we’ll ignore other considerations like style, emotion, story for the moment) Speech recognition is at best 95%, object recognition is worse. The more layers of AI you have, usually those errors will multiply (in some cases there might be improvement though) . While it’s possible automation will be able to produce a decent rough cut, these errors make it difficult to see automation replacing most of the types of videos that pro editors are typically employed for.
Secondly, if the videos are being done for humans, frequently the humans don’t know what they want. Or at least they’re not going to be able to communicate it in such a way that a computer will understand and be able to make changes. If you’ve used Alexa or Echo, you can see how well A.I. understands humans. Lots of situations, especially literal ones (find me the best restaurant), it works fine, lots of other situations, not so much.
Many times as an editor, the direction you get from clients is subtle or you have to read between the lines and figure out what they want. It’s going to be difficult to get A.I.s to take the way humans usually describe what they want, figure out what they actually want and make those changes.
Third… then you get into the whole issue of emotion and storytelling, which I don’t think A.I. will do well anytime soon. The Economist recently had an amusing article where it let an A.I. write the article. The result is here. Very good at mimicking the style of the Economist but when it comes to putting together a coherent narrative… ouch.
It’s Not All Good News
There are already phone apps that do basic automatic editing. These are more for consumers that want something quick and dirty. For most of the type of stuff professional editors get paid for, it’s unlikely what I’ve seen from the apps will replace humans any time soon. Although, I can see how the tech could be used to create rough cuts and the like.
Also, for some types of videos, wedding or music videos perhaps, you can make a pretty solid case that A.I. will be able to put something together soon that looks reasonably professional.
You need training material for neural networks to learn how to edit videos. Thanks to YouTube, Vimeo and the like, there is an abundance of training material. Do a search for ‘wedding video’ on YouTube. You get 52,000,000 results. 2.3 million people get married in the US every year. Most of the videos from those weddings are online. I don’t think finding a few hundred thousand of those that were done by a professional will be difficult. It’s probably trivial actually.
Same with music videos. There IS enough training material for the A.I.s to learn how to do generic editing for many types of videos.
For people that want to pay $49.95 to get their wedding video edited, that option will be there. Probably within a couple years. Have your guests shoot video, upload it and you’re off and running. You’ll get what you pay for, but for some people it’ll be acceptable. Remember, A.I. is very good at mimicking. So the end result will be a very cookie cutter wedding video. However, since many wedding videos are pretty cookie cutter anyways… at the low end of the market, an A.I. edited video may be all ‘Bridezilla on A Budget’ needs. And besides, who watches these things anyways?
Let The A.I Do The Grunt Work, Not The Editing
The losers in the short term may be assistant editors. Many of the tasks A.I. is good for… transcribing, searching for footage, etc.. is now typically given to assistants. However, it may simply change the types of tasks assistant editors are given. There’s a LOT of metadata that needs to be entered and wrangled.
While A.I. is already showing up in many aspects of video production, it feels like having it actually do the editing is quite a ways off. I can see creating A.I. tools that help with editing: Rough cut creation, recommending color corrections or B roll selection, suggesting changes to timing, etc. But there’ll still need to be a person doing the edit.
Wherein Jim Tierney rants and opines about After Effects, Premiere Pro, Final Cut Pro, and other nonsense