This past two weeks Social Media channels have been flooded with video production crews sharing their remote editing stations and workflows. As everybody struggles to adapt and stay productive we’re hoping the Transcriptive web app, which has a new beta version, can help you with some of your challenges.
New Transcriptive.com Beta Version
We just updated https://app.transcriptive.com with a new version. It’s still in beta, so it’s still free to use. It’s a pretty huge upgrade from the previous beta. With a new text editor and sharing capabilities. Users can also upload a media file, transcribe, manage speakers, edit, search and export transcripts without having to access Premiere Pro.
But the real strength is the ability to collaborate and share transcripts with Premiere users and other web users.
How’s Transcriptive going to help keep everyone in sync when they’re working remotely?
The web app was designed from the beginning to help editors work remotely with clients or producers. Transcripts can be easily edited and shared between Premiere Pro and a web browser or between two web users.
This means producers, clients, assistant editors, and interns can easily review and clean up transcripts on the web and send them to the Premiere editor. They can also identify the timecode of video segments that are important or have problems. All this can be done in a web browser and then shared.
If you are a video editor and have been transcribing in Premiere Pro, sending the transcripts and media to Transcriptive.com is quick and makes it easy for team members to access the footage and the transcribed text.
Premiere To A Web Browser
Click on the [ t ] menu in Premiere Pro, link to a web project, and then you can upload the transcript, video, or both. Team members can then log into the Transcriptive.com account and view it all!
Web users are able to edit the transcripts, watch the uploaded clips, see the timecode on the transcript, export the transcript as a Word Document, plain text, captions, and subtitle files, etc. Other features like adding comments or highlighting text are coming soon.
From The Web To Premiere
Once web user is done editing or reviewing the transcript, the editor can pull it back into Premiere. Again, go to the [ t ] menu, and select ‘Download From Your Web Project’. This will download the transcript from Transcriptive.com and load it for the linked video.
Web users can also transcribe videos they upload and share them with other web users. The transcripts can then be downloaded by an editor working in Premiere. Usually it’s a bit easier to start the video upload process from Premiere, but it is possible to do everything from Transcriptive.com.
It’s a powerful way of collaborating with remote users, letting you share videos, transcripts and timecode. Round tripping from Premiere to the web and back again, quickly and easily. Exactly what you need for keeping projects going right now.
Curious to try our BETA web App but still have questions on how it works? Send an email to email@example.com. And if you have tried the App we would love to hear your feedback!
If you’ve been using Speechmatics credits to transcribe in Transcriptive, our transcription plugin for Premiere Pro, then you noticed that accessing your credits in Transcriptive 2.0.2 and later is not an option anymore. Speechmatics is discontinuing the API that we used to support their service in Transcriptive, which means your Speechmatics credentials can no longer be validated inside of the Transcriptive panel.
We know a lot of users still have Speechmatics credits and have been working closely with Speechmatics so those credits can be available in your Transcriptive account as soon as possible. Hopefully in the next week or two.
In the meantime, there are a couple ways users can still transcribe with Speechmatics credits. 1) Use an older version of Transcriptive like v1.5.2 or v2.0.1. Those should still work for a bit longer but uses the older, less accurate API or 2) Upload directly on their website and export the transcript as a JSON file to be imported into Transcriptive. It is a fairly simple process and a great temporary solution for this. Here’s a step-by-step guide:
1. Head to the Speechmatics website – To use your Speechmatics credits, head to www.speechmatics.com and login to your account. Under “What do you want to do?”, choose “Transcription” and select the language of your file.
2. Upload your media file to the Speechmatics website – Speechmatics will give you the option to drag and drop or select your media from a folder on your computer. Choose whatever option works best for you and then click on “Upload”. After the file is uploaded, the transcription will start automatically and you can check the status of the transcription on your “Jobs” list. 3. Download a .JSON file – After the transcription is finished (refresh the page if the status doesn’t change automatically!), click on the Actions icon to access the transcript. You will then have the option to export the transcript as a .JSON file
4. Import the .JSON file into any version of Transcriptive – Open your Transcriptive panel in Premiere. If you are usingTranscriptive 2.0, be sure Clip Mode is turned on. Select the clip you have just transcribed on Speechmatics and click on “Import”. If you are using an older version of Transcriptive, drop the clip into a sequence before choosing “Import”.
You will then have the option to “Choose an Importer”. Select the JSON option and import the Speechmatics file saved on your computer. The transcript will be synced with the clip automatically at no additional charge.
One important thing to know is that, although Transcriptive v1.x still have Speechmatics as an option and it still works, we would still recommend following the steps above to transcribe with Speechmatics credits. The option available in these versions of the panel is an older version of their API and less accurate than the new version. So we recommend you transcribe on the Speechmatics website if you want to use your Speechmatics credits now and not wait for them to be transferred.
However, we should have the transfer sorted out very soon, so keep an eye open for an email about it if you have Speechmatics credits. If the email address you use for Speechmatics is different than the one you use for Transcriptive.com, please email firstname.lastname@example.org. We want to make sure we get things synced up so the credits go to the right place!
A lot of you have a ton of footage that you want to transcribe. One of our goals with Transcriptive has been to enable you to transcribe everything that goes into your Premiere project. To search it, to create captions, to easily see what talent is saying, etc. But if you’ve got 100 hours of footage, even at $0.12/min the costs can add up. So…
Transcriptive has a new feature that will help you cut your transcribing costs by 50%. The latest version of our Premiere Pro transcription plugin has already cut the costs of transcribing from $0.012 to $0.08. However, our new prepaid minutes’ packages goes even further… allowing users to purchase transcribing credits in bulk! You can save 50% per minute, transcribing for $2.40/hr or .04/min. This applies to both Transcriptive AI or Speechmatics.
The pre-paid minutes option will reduce transcription costs to $0.04/min which can be purchased in volume for $150 or $500. For small companies and independent editors, the $150 package will make it possible to secure 62.5 hours of transcription without breaking the bank. If you and your team are transcribing large amounts of footage, going for the $500 will allow you to save even more.
The credits are good for 24 months, so you don’t need to worry about them expiring.
You don’t HAVE to pre-pay. You can still Pay-As-You-Go for $0.08/min. That’s still really inexpensive for transcription and if you’re happy with that, we’re happy with it too.
However, if you’re transcribing a lot of footage, pre-paying is a great way of getting costs down. It also has other benefits, you don’t need to share your credit card with co-workers and other team members. For bigger companies, production managers, directors or even an account department can be in charge of purchasing the minutes and feeding credits into the Premiere Pro Transcriptive panel so editors no longer have to worry about the charges submitted to the account holder’s credit card.
Buying the minutes in advance is simple! Go to your Premiere Pro panel, click on your profile icon, choose “Pre-Pay Minutes” and select the option that better suits your needs. You can also pre-pay credits from your web app account by logging into app.transcriptive.com, opening your “Dashboard” and clicking on “Buy Minutes”. A pop-up window will ask you to choose the pre-paid minutes package and ask for the credit card information. Confirm the purchase and your prepaid minutes will show under “Balance” on your homepage. The prepaid minutes’ balance will also be visible in your Premiere Pro panel, right next to the cost of the transcription.
Applying purchased credits to your transcription jobs is also a quick and easy process. While submitting a clip or sequence for transcription, Transcriptive will automatically deduct the amount required to transcribe the job from your balance. If the available credit is not enough to transcribe your job, the remaining minutes will be charged to the credit card on file.
The 50% discount on prepaid minutes will only apply to transcribing, but minutes can be used to Align existing transcripts at regular cost. English transcripts can be imported into Transcriptive and aligned to your clips or sequences for free, while text in other languages will align for $0.02/min with Transcriptive AI and $0.04/min with Transcriptive Speechmatics.
When A.I. works, it can be amazing. BUT you can waste a lot of time and money when it doesn’t work. Garbage in, garbage out, as they say. But what is ‘garbage’ and how do you know it’s garbage? That’s one of the things, hopefully, I’ll help answer.
Why Even Bother?
It’s a bit tedious to do the testing, but being able to identify the most accurate service will save you a lot of time in the long run. Cleaning up inaccurate transcripts, metadata, or keywords is far more tedious and problematic than doing a little testing up front. So it really is time well spent.
One caveat… There’s a lot of potential ways to use A.I., and this is only going to cover Speech-to-Text because that’s what I’m most familiar with due to Transcriptive and getting A.I. transcripts in Premiere. But if you understand how to evaluate one use, you should, more or less, be able to apply your evaluation method to others. (i.e. for testing audio, you want varying audio quality among your samples. If testing images you want varying quality (low light, blurriness, etc) among your samples)
At Digital Anarchy, we’re constantly evaluating a basket of A.I. services to determine what to use on the backend of Transcriptive. So we’ve had to come up with a methodology to fairly test how accurate they are. Most of the people reading this are in a bit different situation… testing solutions from various vendors that use A.I. instead of testing the A.I. directly. However, since different vendors use different A.I. services, this methodology will still be useful for you in comparing the accuracy of the A.I. at the core of the solutions. There may be, of course, other features of a given solution that may affect your decision to go with one or the other, but at least you’ll be able to compare accuracy objectively.
Here’s an outline of our method:
Always use new files that haven’t been processed before by any of the A.I. services.
Keep them short. (1-2min)
Choose files of varying quality.
Use a human transcription service to create the ‘test master’ transcript.
Have someone do a second pass to correct any human errors.
Create a set of rules on word/punctuation errors for what counts as an error (or 1/2 or two).
If you change them halfway through the test, you need to re-test everything.
Apply them consistently. If something is ambiguous, create a rule for how it will be handled and alway apply it that way.
Compare the results and may the best bot win.
May The Best Bot Win : Visualizing
The main chart compares each engine on a specific file (i.e. File #1, File # 2, etc), using both word and punctuation accuracy. This is really what we use to determine which is best, as punctuation matters. It also shows where each A.I. has strengths and weaknesses. The second, smaller chart shows each service from best result to worst result, using only word accuracy. Every A.I. will eventually fall off a cliff in terms of accuracy. This chart shows you the ‘profile’ for each service and can be a little bit clearer way of seeing which is best overall, ignoring specific files.
First it’s important to understand how A.I. works. Machine Learning is used to ‘train’ an algorithm. Usually millions of bits of data that have been labeled by humans are used to train it. In the case of Speech-to-Text, these bits are audio files with a human transcripts. This allows the A.I. to identify which audio waveforms, the word sounds, go with which bits of text. Once the algorithm has been trained, we can then send audio files to the algorithms and it makes it’s best guess as to which word every waveform corresponds to.
A.I. algorithms are very sensitive to what they’ve been trained on. The further you get away from what they’ve been trained on, the more inaccurate they are. For example, you can’t use an English A.I. to transcribe Spanish. Likewise, if an A.I. has been trained on perfectly recorded audio with no background noise, as soon as you add in background noise it goes off the rails. In fact, the accuracy of every A.I. eventually falls off a cliff. At that point it’s more work to clean it up than to just transcribe it manually.
Always Use New Files
Any time you submit a file to an A.I. it’s possible that the A.I. learns from that file. So you really don’t want to use the same file over and over and over again. To ensure you’re getting unbiased results it’s best to use new files every time you test.
Keep The Test Files Short
First off, comparing transcripts is tedious. Short transcripts are better than long ones. Secondly, if the two minutes you select is representative of an hour long clip, that’s all you need. Transcribing and comparing the entire hour won’t tell you anything more about the accuracy. The accuracy of two minutes is usually the same as the accuracy of the hour.
Of course, if you’re interviewing many different people over that hour in different locations, with different audio quality (lots of background noise, no background noise, some with accents, etc)… two minutes won’t be representative of the entire hour.
Chose Files of Varying Quality
This is critical! You have to choose files that are representative of the files you’ll be transcribing. Test files with different levels of background noise, different speakers, different accents, different jargon… whatever issues usually occur in the dialog typically in your videos. ** This is how you’ll determine what ‘garbage’ means to the A.I. **
Use Human Transcripts for The ‘Test Master’
Send out the files to get transcribed by a person. And then have someone within your org (or you) go over them for errors. There usually are some, especially when it comes to jargon or names (turns out humans aren’t perfect either! I know… shocker.). These transcripts will be the what you compare the A.I. transcripts against, so they need to be close to perfect.If you change something after you start testing, you need to re-test the transcripts you’ve already tested.
Create A Set of Rules And Apply Them Consistently
You need to figure out what you consider one error, a 1/2 error or two errors. In most cases it doesn’t matter exactly what you decide to do, only that you do it consistently. If a missing comma is 1/2 an error, great! But it ALWAYS has to be a 1/2 error. You can’t suddenly make it a full error just because you think it’s particularly egregious. You want to remove judgement out of the equation as much as possible. If you’re making judgement calls, it’s likely you’ll choose the A.I. that most resembles how you see the world. That may not be the best A.I. for your customers. (OMG… they used an Oxford Comma! I hate Oxford commas! That’s at least TWO errors!).
And NOW… The Moment You’ve ALL Been Waiting For…
Add up the errors, divide that by the number of words, put everything into a spreadsheet… and you’ve got your winner!
It’s a bit tedious to do the testing, but being able to identify the most accurate service will save you a lot of cleanup time in the long run. So it really is time well spent.
Hopefully this post has given you some insights into how to test whatever type of A.I. services you’re looking into using. And, of course, if you haven’t checked out Transcriptive, our A.I. transcript plugin for Premiere Pro, you need to!Thanks for reading and please feel free to ask questions in the comment section below!
Have you ever considered using Transcriptive to build an effective Search Engine Optimization (SEO) strategy and increase the reach of your Social Media videos? Having your footage transcribed right after the shooting can help you quickly scan everything for soundbites that will work for instant social media posts. You can find the terms your audience searches for the most, identify high ranked keywords in your footage, and shape the content of your video based on your audience’s behavior.
According to vlogger and Social Media influencer Jack Blake, being aware of what your audience is doing online is a powerful tool to choose when and where to post your content, but also to decide what exactly to include in your Social Media Videos, which tend to be short and soundbite-like. The content of your media, titles, video descriptions and thumbnails, tags and post mentions should all be part of a strategy built based on what your audience is searching for. And this is why Blake is using Transcriptive not only to save time on editing but also to carefully curate his video content and attract new viewers.
Right after shooting his videos, the vlogger transcribes everything and exports the transcripts as rich text so he can quickly share the content with his team. After that, a Copywriter scans through the transcribed audio and identifies content that will bring traffic to the client’s website and increase ROI. “It’s amazing. I transcribe the audio in minutes, edit some small mistakes without having to leave Premiere Pro, and share the content with my team. After that, we can compare the content with our targeted keywords and choose what I should cut. The editing goes quickly and smoothly because the words are already time-stamped and my captions take no time to create. I just export the transcripts as an SRT and it is pretty much done, explains Blake.
Of course, it all starts with targeting the right keywords and that can be tricky, but there are many analytics and measurement applications offering this service nowadays. If you are just getting started in the whole keyword targeting game, the easiest and most accessible way is connecting your In-site Search queries with Google Analytics. This will allow you to get information on how users are interacting with your website, including how much your audience searches, who is performing searches and who is not, and where they begin searching, as well where they head to afterward. Google Analytics will also allow you to find out what exactly people are typing into Google when searching for content on the web.
For Blake, using competitors’ hashtags from Youtube has been very helpful to increase video views. “One of the differentials in my work is that I research my client’s competitors on Youtube and identify the VidIQs (Youtube keyword tags) they have been using on their videos so we can use competitive tagging in our content description and video title. This allows the content I produced for the client to show when people search for this specific hashtag on Youtube,” he explains. Blake’s team is also using Google Trends, a website that analyzes the popularity of top search queries in Google Search across various regions and languages. It’s a great tool to find out how often a search term is entered in Google’s search engine, compare it to their total search volume, and learn how search trends varied within a certain interval of time.
When asked what would be the last thing he would recommend to video makers wanting to boost their video views on Social Media, Blake had no hesitation in choosing captions. “Social media feeds are often very crowded, fast-moving, and competitive. Nobody has time to open the video as full screen, turn the sound on and watch the whole thing, they often watch the videos without sound, and if the captions are not there then your message will not get through. And Transcriptive makes captioning a very easy process,” he says.
The struggle of making documentary films nowadays is real. Competition is high, and budget limitations can stretch a 6-year deadline to a 10 year-long production. To make a movie you need money. To get the money you need decent, and sometimes edited, footage material to show to funding organizations and production companies. And decent footage, well-recorded audio, as well as edited pieces cost money to produce. I’ve been facing this problem myself and discovered through my work at Digital Anarchy that finding an automated tool to transcribe footage can be instrumental in making small and low budget documentary films happen.
In this interview, I talked to filmmaker Chuck Barbee to learn how Transcriptive is helping him to edit faster and discussed some tips on how to get started with the plugin. Barbee has been in the Film and TV business for over 50 years. In 2005, after an impressive career in the commercial side of the Film and TV business, he moved to California’s Southern Sierras and began producing a series of personal “passion” documentary films. His projects are very heavy on interviews, and the transcribing process he used all throughout his career was no longer effective to manage his productions.
Barbee has been using Transcriptive for a month, but already consider the plugin a game-changer. Read on to learn how he is using the plugin to makea long-form documentary about the people who created what is known as “The Bakersfield Sound” in country music.
DA: You have worked in a wide variety of productions throughout your career. Besides co-producing, directing, and editing prime-time network specials and series for the Lee Mendelson Productions, you also worked as Director of Photography for several independent feature films. In your opinion. How important is the use of transcripts in the editing process?
CB: Transcripts are essential to edit long-form productions because they allow producers, editors, and directors to go through the footage, get familiarized with the content, and choose the best bits of footage as a team. Although interview oriented pieces are more dependent on transcribed content, I truly believe transcripts are helpful no matter what type of motion picture productions you are making.
On most of my projects, we always made cassette tape copies of the interviews, then had someone manually transcribe them and print hard copies. With film projects, there was never any way to have a time reference in the transcripts, unless you wanted to do that manually. Then in the video, it was easier to make time-coded transcripts, but both of these methods were time-consuming and relatively expensive labor wise. This is the method I’ve used since the late ’60s, but the sheer volume of interviews on my current projects and the awareness that something better probably exists with today’s technology prompted me to start looking for automated transcription solutions. That’s when I found Transcriptive.
DA: And what changed now that you are using Artificial Intelligence to transcribe your filmed interviews in Premiere Pro?
CB: I think Transcriptive is a wonderful piece of software. Of course, it is only as good as the diction of the speaker and the clarity of the recording, but the way the whole system works is perfect. I place an interview on the editing timeline, click transcribe and in about 1/3 of the time of the interview I have a digital file of the transcription, with time code references. We can then go through it, highlighting sections we want, or print a hard copy and do the same thing. Then we can open the digital version of the file in Premiere, scroll to the sections that have been highlighted, either in the digital file or the hard copy, click on a word or phrase and then immediately be at that place in the interview. It is a huge time saver and a game-changer.
The workflow has been simplified quite a bit, the transcription costs are down, and the editing process has sped up because we can search and highlight content inside of Premiere or use the transcripts to make paper copies. Our producers prefer to work from a paper copy of the interviews, so we use that TXT or RTF file to make a hard copy. However, Transcriptive can also help to reduce the number of printed materials if a team wants to do all the work digitally, which can be very effective.
DA: What makes you choose between highlighting content in the panel and using printed transcripts? Are there situations where one option works better than the other?
CB: It really depends on producer/editor choices. Some producers might want to have a hard copy because they would prefer that to work on a computer. It really doesn’t matter much from an editor’s point of view because it is no problem to scroll through the text in Transcriptive to find the spots that have been highlighted on the hard copy. All you have to do is look at the timecode next to the highlighted parts of a hard copy and then scroll to that spot in Transcriptive. Highlighting in Transcriptive means you are tying up a workstation, with Premiere, to do that. If you only have one editing workstation running Premiere, then it makes more sense to have someone do the highlighting with a printed hard copy or on a laptop or any other computer which isn’t running Premiere.
DA: You mentioned the AI transcription is not perfect, but you would still prefer that than paying for human transcripts or transcribing the interviews yourself. Why do you think the automated transcripts are a better solution for your projects?
CB:Transcriptive is amazing accurate, but it is also quite “literal” and will transcribe what it hears. For example, if someone named “Artie” pronounces his name “RD”, that’s what you’ll get. Also, many of our subjects have moderate to heavy accents and that does affect accuracy. Another thing I have noticed is that, when there is a clear difference between the sound of the subject and the interviewer, Transcriptive separates them quite nicely. However, when they sound alike, it can confuse them. When multiple voices speak simultaneously, Transcriptive also has trouble, but so would a human.
My team needs very accurate transcripts because we want to be able to search through 70 or more transcripts, looking for keywords that are important. Still, we don’t find the transcription mistakes to be a problem. Even if you have to go through the interview when it comes back to make corrections, It is far simpler and faster than the manual method and cheaper than the human option. Here’s what we do: right after the transcripts are processed, we go through each transcript with the interviews playing along in sync, making corrections to spelling or phrasing or whatever, especially with keywords such as names of people, places, themes, etc. It doesn’t take too much time and my tip is that you do it right after the transcripts are back, while you are watching the footage to become familiar with the content.
DA: Many companies are afraid of incorporating Transcriptive into an on-going project workflow. How was the process of using our transcription plugin in a long-form documentary film right away?
CB: We have about 70 interviews of anywhere from 30 minutes to one hour each. It is a low budget project, being done by a non-profit called “Citizens Preserving History“.The producers were originally going to try to use time-code-window DVD copies of the interviews to make notes about which parts of the interviews to use because of budget limitations. They thought the cost of doing manually typed transcriptions was too much. But as they got into the process they began to see that typed transcripts were going to be the only way to go. Once we learned about Transcriptive and installed it, it only took a couple of days to do all 70 interviews and the cost, at 12 cents per minute is small, compared to manual methods.
Transcriptive is very easy to use and It honestly took almost no time for me to figure out the workflow. The downloading and installation process was simple and direct and the tech support at Digital Anarchy is awesome. I’ve had several technical questions and my phone calls and emails have been answered promptly, by cheerful, knowledgeable people who speak my language clearly and really know what they are doing. They can certainly help quickly if people feel lost or something goes wrong so I would say do yourself a favor and use Transcriptive in your project!
Here’s a short version of the opening tease for “The Town That Wouldn’t Die”, Episode III of Barbee’s documentary series:
There are plenty of horrible things A.I. might be able to do in the future. And this MIT article lists six potential problem areas in the very near future, which are legit to varying degrees. (Although, this is more a list of humans behaving badly than A.I. per se)
However, most people don’t realize exactly how rudimentary (i.e. dumb) A.I. is in it’s current state. This is part of the problem with the MIT list. The technology is prone to biases, many false positives, difficulty with simple situations, etc., etc. The problem is more humans trying to make use of and/or make critical decisions based on immature technology.
For those of us that work with it regularly, we see all the limitations on a daily basis, so the idea of A.I. taking over the world is a bit laughable. In fact, you can see it daily yourself on your phone.
Take the auto-suggest feature on the iPhone. You would think the Natural Language Processing could take a phrase like ‘Glad you’re feeling b…’ and suggest things like better, beautiful or whatever. Not so hard, right?
How often does ‘glad’, ‘feeling’ and ‘bad’ appear in the same sentence? And you want to let A.I. drive your car?
We’ve got a ways to go.
Unless, of course, it’s a human problem again and there are a bunch of asshats out there that are glad you’re feeling bad. Oh, wait… it’s the internet. Right.
So you’ve uploaded your video to Facebook or YouTube and you’d like to import the captions they automatically generate with Artificial Intelligence into Transcriptive. This can be a good, FREE way of getting a transcript.
Transcriptive imports SRT files, so… all you need is an SRT file from those services. That’s easy peasy with YouTube, you just go to the Captions section and download>SRT.
Download the SRT and you’re done. Import the SRT into Transcriptive with ‘Combine Lines into Paragraphs’ turned on… Easy, free transcription.
With Facebook it’s more difficult as they don’t let you just download an SRT file. Or any file for that matter. So you need to get tricky.
Open Facebook in Firefox and go to the Web Developer>Network. This will open the inspector at the bottom of you browser window.
Which will give you something that looks like this:
Go to the Facebook video you want to get the caption file for.
Once the video starts playing, type SRT into the Filter field (as shown above)
This _should_ show an XHR file. (we’ve seen instances where it doesn’t, not sure why. So this might not work for every video)
Right Click on it, select Copy>Copy URL (as shown above)
Open a new Tab and paste in the URL.
You should now be asked to download a file. Save this as an SRT file (e.g. MyVideo.srt).
Import the SRT into Transcriptive with ‘Combine Lines into Paragraphs’ turned on… Easy, free transcription.
So that’s it. This worked as of this writing. It’s entirely possible Facebook will make a change at some point preventing this, but for now, it’s a good way of getting free transcriptions.
You can also do this in other browsers, I’m just using Firefox as an example.
We’ve released PowerSearch 1.0 for Premiere Pro! It’s a new part of the Transcriptive suite of tools that’s essentially a search engine for Premiere letting you search clips, sequences, markers, metadata and captions all in one place.
It streamlines your editing by allowing you to quickly search hours of video for words or phrases. While it works best when used in conjunction with Transcriptive, it plays well with any service that can get transcripts or SRTs (captions) into Premiere Pro. It’s all about helping you find data, we don’t care where the data comes from.
Like any search engine, it displays a list of results . In most cases, clicking on the result takes you to the exact moment the words were spoken in either the Source panel (clips) or the Timeline panel (sequences). If you’ve ever been asked to find a 15 second quote and had to dig through 50 hours of footage to find it, you know how valuable of a time saving tool this is.
I decided to try Transcriptive way before I became part of the Digital Anarchy family. Just like any other aspiring documentary filmmaker, I knew relying on a crew to get my editing started was not an option. Without funding you can’t pay a crew; without a crew you can’t get funding. I had no money, an idea in my head, some footage shot with the help of friends, and a lot of work to do. Especially when working on your very first feature film.
Besides being an independent Filmmaker and Social Media strategist for DA, I am also an Assistive Technology Trainer for a private company called Adaptive Technology Services. I teach blind and low vision individuals how to take advantage of technology to use their phones and computers to rejoin the workforce after their vision loss. Since the beginning of my journey as an AT Trainer – I started as a volunteer 6 years ago – I have been using my work to research the subject and prepare for this film.
My movie is about the relationship between the sighted and non-sighted communities. It seeks to establish a dialog between people with and without visual disabilities so we can come together to demystify disabilities to those without them. I know it is an important subject, but right from the beginning of this project I learned how hard it is to gather funds for any disability-related initiative. I had to carefully budget the shoots and define priorities. Paying a post-production crew was not (and still is not) possible. I have to write and cut samples on my own for now. Transcriptive was a way for me to get things moving by myself so I can apply for grants in the near future and start paying producers, editors, camera operators, sound designers, and get the project going for real. The journey started with transcribing the interviews. Transcriptive did a pretty good job with transcribe the audio from the camera as you can see below. Accuracy got even better when transcribing audio from the mic.
The idea of getting accurate automated transcripts brought a smile to my face. But could Artificial Intelligence really get the job done for me? I never believed so, and I was right. The accuracy for English interviews was pretty impressive. I barely had to do any editing on those. The situation changed as soon as I tried transcribing audio in my native language, Brazilian Portuguese. The AI transcription didn’t just get a bit flaky; it was completely unusable so I decided to do not waste more time and start doing my manual transcriptions.
I have been using Speechmatics for most of my projects because the accuracy is considerably higher than Watson with English. However, after trying to transcribe in Portuguese for the first time, it occurred to me Speechmatics actually offers Portuguese from Portugal while Watson transcribes Portuguese from Brazil. I decided to give Watson a try, but the transcription was not much better than the one I got from Speechmatics.
It is true the Brazilian Portuguese footage I was transcribing was b-roll clips recorded with a Rhode mic; placed on top of my DSLR. They were not well mic’d sit down interviews. The clips do have decent audio, but also involve some background noise that does not help foreign language speech-to-text conversion. At the time I had a deadline to match and was not able to record better audio and compare Speechmatics and Watson Portuguese transcripts. Will be interesting to give it another try, with more time to further compare and evaluate if there are advantages on using Watson for my next batch of footage.
Days after my failed attempt to transcribe Brazilian Portuguese with Speechmatics, I went back to the Transcriptive panel for Premiere, found an option to import my human transcripts, gave it a try, and realized I could still use Transcriptive to speed up my video production workflow. I could still save time by letting Transcriptive assign timecode to the words I transcribed, which would be nearly impossible for me to do on my own. The plugin allowed me to quickly find where things were said in 8 hours of interviews. Having the timecode assigned to each word allowed me to easily search the transcript and jump to that point in my video where I wanted to have a cut, marker, b-roll or transition effect applied.
My movie is still in pre-production and my Premiere project is honestly not that organized yet so the search capability was also a huge advantage. I have been working on samples to apply for grants, which means I have tons of different sequences, multicam sequences, markers that now live in folders inside of folders. Before I started working for DA I was looking for a solution to minimize the mess without having to fully organize it or spend too much money and Power Search came to the rescue. Also, being able to edit my transcripts inside of Premiere made my life a lot easier.
Last month, talking to a few film clients and friends, I found out most filmmakers still clean up human transcripts. In my case, I go through the transcripts to add punctuation marks and other things that will remind me how eloquent speakers were in that phrase. Ellipses, question marks and exclamation points remind me of the tone they spoke allowing me to get paper cuts done faster. I am not sure ASR technology will start entering punctuation in the future, but it would be very handy to me. While this is not a possibility, I am grateful Transcriptive now offers a text edit interface, so I can edit my transcripts without leaving Premiere.
For the movie I am making now I was lucky enough to have a friend willing to help me getting this tedious and time-consuming part of the work done so I am now exporting all my transcripts to Transcriptive.com. The app will allow us to collaborate on the transcript. She will be helping me all the way from LA, editing all the Transcripts without having to download a whole Premiere project to get the work done.
For the last 14 years I’ve created the Audio Art Tour for Burning Man. It’s kind of a docent led audio guide to the major art installations out there, similar to an audio guide you might get at a museum.
Burning Man always has a different ‘theme’ and this year it was ‘I, Robot’. I generally try and find background music related to the theme. EDM is big at Burning Man, land of 10,000 DJs, so I could’ve just grabbed some electronic tracks that sounded robotic. Easy enough to do. However I decided to let Artificial Intelligence algorithms create the music! (You can listen to the tour and hear the different tracks)
This turned out to be not so easy, so I’ll break down what I had to do to get seven unique sounding, usable tracks. I had a bit more success with AmperMusic, which is also currently free (unlike Jukedeck), so I’ll discuss that first.
Getting the Tracks
The problem with both services was getting unique sound tracks. The A.I. has a tendency of creating very similar sounding music. Even if you select different styles and instruments you often end up with oddly similar music. This problem is compounded by Amper’s inability to render more than about 30 seconds of music.
What I found I had to do was let it generate 30 seconds randomly or with me selecting the instruments. I did this repeatedly until I got a 30 second sample I liked. At which point I extended it out to about 3 or 4 minutes and turned off all the instruments but two or three. Amper was usually able to render that out. Then I’d turn off those instruments and turn back on another three. Then render that. Rinse, repeat until you’ve rendered all the instruments.
Now you’ve got a bunch of individual tracks that you can combine to get your final music track. Combine them in Audition or even Premiere Pro (or FCP or whatever NLE) and you’re good to go. I used that technique to get five of the tracks.
Jukedeck didn’t have the rendering problem but it REALLY suffered from the ‘sameness’ problem. It was tough getting something that really sounded unique. However, I did get a couple good tracks out of it.
Problems Using Artificial Intelligence
This is another example of A.I. and Machine Learning that works… sort of. I could have found seven stock music tracks that I like much faster (this is what I usually do for the Audio Art Tour). The amount of time it took me messing around with these services was significant. Also, if Jukedeck is any indication, a music track from one of these services will cost as much as a stock music track. Just go to Pond5 to see what you can get for the same price. With a much, much wider variety. I don’t think living, breathing musicians have much to worry about. At least for now.
That said, I did manage to get seven unique, cool sounding tracks out of them. It took some work, but it did happen.
As with most A.I./ML, it’s difficult to see what the future looks like. There has certainly been a ton of advances, but I think in a lot of cases, it’s some of the low hanging fruit. We’re seeing that with Speech-to-text algorithms in Transcriptive where they’re starting to plateau and cluster around the same accuracy levels. The fruit (accuracy) is now pretty high up and improvement are tough. It’ll be interesting to see what it takes to break through that. More data? Faster servers? A new approach?
I think music may be similar. It seems like it’s a natural thing for A.I. but it’s deceptively difficult to do in a way that mimics the range and diversity of styles and sounds that many human musicians have. Particularly a human armed with a synth that can reproduce an entire orchestra. We’ll see what it takes to get A.I. music out of the Valley of Sameness.
Premiere Pro CS6 has the ability to turn speech into text and put it into the Speech Analysis metadata. You can still use it in any version of Premiere Pro.
In Premiere CS6 you can right+click on a piece of footage and select ‘Analyze Content’. This would turn all the speech into text. Adobe removed it in later versions of Creative Cloud but all that infrastructure is still in Premiere Pro CC 2018 (and other versions) and this post will tell you how to make use of it with, and without, Transcriptive, our plugin for transcribing video.
First off, if you have Creative Cloud, you still have access to CS6 (or CC). You can download it and use that to turn all your speech to text. This will get saved with your file and when you import it into Premiere 2018, all the text will be in the Speech Analysis field of the Metadata panel. This is very handy as you can use the text with the Source panel to set in and out points and edit with text.
To get older versions of Premiere, go to the Creative Cloud app and find Premiere Pro. Click the menu button (or down arrow) and select ‘Other Versions’. You can install all the way back to CS6.
Once CS6 is installed, you can import the footage, right+click and select ‘Anaylze Content’. It takes some time to do this, but once it’s done, you’ll have all the speech turned into text in Speech Analysis. Import the clips into the version of Premiere you’re using and all that text will show up in the Metadata panel. Voila! It’s not an awesome interface for editing the text (and it needs a lot of editing as it’s not very accurate, which is why Adobe removed it) but it’s there.
Transcriptive can also use that data. If you’re using Transcriptive and drop the footage into a sequence, it’ll pull the text from the Speech Analysis field.
As mentioned, the CS6 speech-to-text isn’t very accurate, which you can see below. So it’s usually worth it to pay a few cents a minute to get a good A.I. transcript or $1.25/min to get human transcripts (which Transcriptive can import).
However, if you want free, then the CS6 trick is one way of doing it. Or you could use YouTube and import their captions into Transcriptive. It’s free, easy and we have a great tutorial that shows you how to get YouTube captions into Premiere!
1) Practically every company exhibiting was talking about A.I.-something.
2) VR seemed to have disappeared from vendor booths.
The last couple years at NAB, VR was everywhere. The Dell booth had a VR simulator, Intel had a VR simulator, booths had Oculuses galore and you could walk away with an armful of cardboard glasses… this year, not so much. Was it there? Sure, but it was hardly to be seen in booths. It felt like the year 3D died. There was a pavilion, there were sessions, but nobody on the show floor was making a big deal about it.
In contrast, it seemed like every vendor was trying to attach A.I. to their name, whether they had an A.I. product or not. Not to mention, Google, Amazon, Microsoft, IBM, Speechmatics and every other big vendor of A.I. cloud services having large booths touting how their A.I. was going to change video production forever.
I’ve talked before about the limitations of A.I. and I think a lot of what was talked about at NAB was really over promising what A.I. can do. We spent most of the six months after releasing Transcriptive 1.0 developing non-A.I. features to help make the A.I. portion of the product more useful. The release were announcing today and the next release coming later this month will focus on getting around A.I. transcripts completely by importing human transcripts.
There’s a lot of value in A.I. It’s an important part of Transcriptive and for a lot use cases it’s awesome. There are just also a lot of limitations. It’s pretty common that you run into the A.I. equivalent of the Uncanny Valley (a CG character that looks *almost* human but ends up looking unnatural and creepy), where A.I. gets you 95% of the way there but it’s more work than it’s worth to get the final 5%. It’s better to just not use it.
You just have to understand when that 95% makes your life dramatically easier and when it’s like running into a brick wall. Part of my goal, both as a product designer and just talking about it, is to help folks understand where that line in the A.I. sand is.
I also don’t buy into this idea that A.I. is on an exponential curve and it’s just going to get endlessly better, obeying Moore’s law like the speed of processors.
When we first launched Transcriptive, we felt it would replace transcriptionists. We’ve been disabused of that notion. ;-) The reality is that A.I. is making transcriptionists more efficient. Just as we’ve found Transcriptive to be making video editors more efficient. We had a lot of folks coming up to us at NAB this year telling us exactly that. (It was really nice to hear. :-)
However, much of the effectiveness of Transcriptive comes more from the tools that we’ve built around the A.I. portion of the product. Those tools can work with transcripts and metadata regardless of whether they’re A.I. or human generated. So while we’re going to continue to improve what you can do with A.I., we’re also supporting other workflows.
Over the next couple months you’re going to see a lot of announcements about Transcriptive. Our goal is to leverage the parts of A.I. that really work for video production by building tools and features that amplify those strengths, like PowerSearch our new panel for searching all the metadata in your Premiere project, and build bridges to other technology that works better in other areas, such as importing human created transcripts.
Should be a fun couple months, stay tuned! btw… if you’re interested in joining the PowerSearch beta, just email us at email@example.com.
Addendum: Just to be clear, in one way A.I. is definitely NOT VR. It’s actually useful. A.I. has a lot of potential to really change video production, it’s just a bit over-hyped right now. We, like some other companies, are trying to find the best way to incorporate it into our products because once that is figured out, it’s likely to make editors much more efficient and eliminate some tasks that are total drudgery. OTOH, VR is a parlor trick that, other than some very niche uses, is going to go the way of 3D TV and won’t change anything.
Chief Executive Anarchist
For all the developments in artificial intelligence, one of the consistently worst uses of it is with chatbots. Those little ‘Chat With Us’ side bars on many websites. Since we’re doing a lot with artificial intelligence (A.I.) in Transcriptive and in other areas, I’ve gotten very familiar with how it works and what the limitations are. It starts to be easy to spot where it’s being used, especially when it’s used badly.
So A.I. chatbots, which really doesn’t work well, have become a bit of a pet peeve of mine. If you’re thinking about using them for your website, you owe it to yourself to click around the web and see how often ‘chatting’ gets you a usable answer. It’s usually just frustrating. You go a few rounds with a cheery chatbot before getting to what you were going to do in the first place… send a message that will be replied to by a human. Total waste of time and doesn’t answer the questions.
Do you trust cheery, know-nothing chatbots with your customers?
The main problem is that chatbots don’t know when to quit. I get it that some business receive the same question over and over… where are you located? what are your hours? Ok, fine, have a chatbot act as a FAQ. But the chatbot needs to quickly hand off the conversation to a real person if the questions go beyond what you could have in an FAQ. And frankly, an FAQ would be better than trying to fake-out people with your A.I. chatbot. (honesty and authenticity matter, even on the web)
A.I. is just not great at reading comprehension. It can get the jist of things usually, which I think is useful for analytics and business intelligence. But this doesn’t allow it to respond with any degree of accuracy or intelligence. For responding to customer queries it produces answers that are sort of close… but mostly unusable. So, the result is frustrated customers.
Take a recent experience with Audi. I’m looking at buying a new car and am interested in one of their SUVs. I went onto an Audi dealer site to inquire about a used one they had. I wanted to know 1) was it actually in stock and 2) how much of the original warranty was left since it was a 2017? There was a button to send a message which I was originally going to use but decided to try the chat button that was bouncing up and down getting my attention.
So, I asked those questions in the chat. If it had been a real person, they definitely could have answered #1 and probably #2, even if they were just an assistant. But no, I ended in the same place I would’ve been if I’d just clicked ‘send a message’ in the first place. But first, I had to get through a bunch of generic answers that didn’t answer any of my questions and just dragged me around in circles. This is not a good way to deal with customers if you’re trying to sell them a $40,000 car.
And don’t get me started on Amazon’s chatbots. (and emailbots for that matter)
It’s also funny to notice how the chatbots try and make you think it’s human, with misspelled words and faux emotions. I’ve had a chatbot admonish me with ‘I’m a real person…’ when I called it a chatbot. It then followed that with another generic answer that didn’t address my question. The Pinocchio chatbot… You’re not a real boy, not a real person and you don’t get to pass Go and collect $200. (The real salesperson I eventually talked to confirmed it was a chatbot.)
I also had one threaten to end the chat if I didn’t watch my language, which was not aimed at the chatbot. I just said, “I just want this to f’ing work”. A little generic frustration. However, after it told me to watch my language, I went from frustrated to kind of pissed. So much for artificial intelligence having emotional intelligence. Getting faux-insulted over something almost any real human would recognize as low grade frustration, is not going to make customers happier.
I think A.I. has some amazing uses, Transcriptive makes great use of A.I. but it also has a LOT of shortcomings. All of those shortcomings are glaringly apparent when you look at chatbots. There are, of course, many companies trying to create conversational A.I. but so far the results have been pretty poor.
Based on what I’ve seen developing products with A.I., I think it’s likely it’ll be quite a while before conversational A.I. is a good experience on a regular basis. You should think very hard about entrusting your customers to it. A web form or FAQ is going to be better than a frustrating experience with a ‘sales person’.
Not sure what this has to do with video editing. Perhaps just another example of why A.I. is going to have a hard time editing anything that requires comprehending the content. Furthering my belief that A.I. isn’t going to replace most video editors any time soon.
A.I. is definitely changing how editors get transcripts and search video for content. Transcriptive demonstrates that pretty clearly with text. Searching via object recognition is something that also is already happening. But what about actual video editing?
One of the problems A.I. has is finishing. Going the last 10% if you will. For example, speech-to-text engines, at best, have an accuracy rate of about 95% or so. This is about on par with the average human transcriptionist. For general purpose recordings, human transcriptionists SHOULD be worried.
But for video editing, there are some differences, which are good news. First, and most importantly, errors tend to be cumulative. So if a computer is going to edit a video, at the very least, it needs to do the transcription and it needs to recognize the imagery. (we’ll ignore other considerations like style, emotion, story for the moment) Speech recognition is at best 95%, object recognition is worse. The more layers of AI you have, usually those errors will multiply (in some cases there might be improvement though) . While it’s possible automation will be able to produce a decent rough cut, these errors make it difficult to see automation replacing most of the types of videos that pro editors are typically employed for.
Secondly, if the videos are being done for humans, frequently the humans don’t know what they want. Or at least they’re not going to be able to communicate it in such a way that a computer will understand and be able to make changes. If you’ve used Alexa or Echo, you can see how well A.I. understands humans. Lots of situations, especially literal ones (find me the best restaurant), it works fine, lots of other situations, not so much.
Many times as an editor, the direction you get from clients is subtle or you have to read between the lines and figure out what they want. It’s going to be difficult to get A.I.s to take the way humans usually describe what they want, figure out what they actually want and make those changes.
Third… then you get into the whole issue of emotion and storytelling, which I don’t think A.I. will do well anytime soon. The Economist recently had an amusing article where it let an A.I. write the article. The result is here. Very good at mimicking the style of the Economist but when it comes to putting together a coherent narrative… ouch.
It’s Not All Good News
There are already phone apps that do basic automatic editing. These are more for consumers that want something quick and dirty. For most of the type of stuff professional editors get paid for, it’s unlikely what I’ve seen from the apps will replace humans any time soon. Although, I can see how the tech could be used to create rough cuts and the like.
Also, for some types of videos, wedding or music videos perhaps, you can make a pretty solid case that A.I. will be able to put something together soon that looks reasonably professional.
You need training material for neural networks to learn how to edit videos. Thanks to YouTube, Vimeo and the like, there is an abundance of training material. Do a search for ‘wedding video’ on YouTube. You get 52,000,000 results. 2.3 million people get married in the US every year. Most of the videos from those weddings are online. I don’t think finding a few hundred thousand of those that were done by a professional will be difficult. It’s probably trivial actually.
Same with music videos. There IS enough training material for the A.I.s to learn how to do generic editing for many types of videos.
For people that want to pay $49.95 to get their wedding video edited, that option will be there. Probably within a couple years. Have your guests shoot video, upload it and you’re off and running. You’ll get what you pay for, but for some people it’ll be acceptable. Remember, A.I. is very good at mimicking. So the end result will be a very cookie cutter wedding video. However, since many wedding videos are pretty cookie cutter anyways… at the low end of the market, an A.I. edited video may be all ‘Bridezilla on A Budget’ needs. And besides, who watches these things anyways?
Let The A.I Do The Grunt Work, Not The Editing
The losers in the short term may be assistant editors. Many of the tasks A.I. is good for… transcribing, searching for footage, etc.. is now typically given to assistants. However, it may simply change the types of tasks assistant editors are given. There’s a LOT of metadata that needs to be entered and wrangled.
While A.I. is already showing up in many aspects of video production, it feels like having it actually do the editing is quite a ways off. I can see creating A.I. tools that help with editing: Rough cut creation, recommending color corrections or B roll selection, suggesting changes to timing, etc. But there’ll still need to be a person doing the edit.
Wherein Jim Tierney rants and opines about After Effects, Premiere Pro, Final Cut Pro, and other nonsense