Tag Archives: artificial intelligence

Testing The Accuracy of Artificial Intelligence (A.I.) Services

When A.I. works, it can be amazing. BUT you can waste a lot of time and money when it doesn’t work. Garbage in, garbage out, as they say. But what is ‘garbage’ and how do you know it’s garbage? That’s one of the things, hopefully, I’ll help answer.

Why Even Bother?

It’s a bit tedious to do the testing, but being able to identify the most accurate service will save you a lot of time in the long run. Cleaning up inaccurate transcripts, metadata, or keywords is far more tedious and problematic than doing a little testing up front. So it really is time well spent.

One caveat… There’s a lot of potential ways to use A.I., and this is only going to cover Speech-to-Text because that’s what I’m most familiar with due to Transcriptive and getting A.I. transcripts in Premiere. But if you understand how to evaluate one use, you should, more or less, be able to apply your evaluation method to others. (i.e. for testing audio, you want varying audio quality among your samples. If testing images you want varying quality (low light, blurriness, etc) among your samples)

At Digital Anarchy, we’re constantly evaluating a basket of A.I. services to determine what to use on the backend of Transcriptive. So we’ve had to come up with a methodology to fairly test how accurate they are. Most of the people reading this are in a bit different situation… testing solutions from various vendors that use A.I. instead of testing the A.I. directly. However, since different vendors use different A.I. services, this methodology will still be useful for you in comparing the accuracy of the A.I. at the core of the solutions. There may be, of course, other features of a given solution that may affect your decision to go with one or the other, but at least you’ll be able to compare accuracy objectively.

Here’s an outline of our method:

  1. Always use new files that haven’t been processed before by any of the A.I. services.
  2. Keep them short. (1-2min)
  3. Choose files of varying quality.
  4. Use a human transcription service to create the ‘test master’ transcript.
    • Have someone do a second pass to correct any human errors.
  5. Create a set of rules on word/punctuation errors for what counts as an error (or 1/2 or two).
    • If you change them halfway through the test, you need to re-test everything.
  6. Apply them consistently. If something is ambiguous, create a rule for how it will be handled and alway apply it that way.
  7. Compare the results and may the best bot win.

May The Best Bot Win : Visualizing

Accuracy rates for different A.I. services

The main chart compares each engine on a specific file (i.e. File #1, File # 2, etc), using both word and punctuation accuracy. This is really what we use to determine which is best, as punctuation matters. It also shows where each A.I. has strengths and weaknesses. The second, smaller chart shows each service from best result to worst result, using only word accuracy. Every A.I. will eventually fall off a cliff in terms of accuracy. This chart shows you the ‘profile’ for each service and can be a little bit clearer way of seeing which is best overall, ignoring specific files.

First it’s important to understand how A.I. works. Machine Learning is used to ‘train’ an algorithm. Usually millions of bits of data that have been labeled by humans are used to train it. In the case of Speech-to-Text, these bits are audio files with a human transcripts. This allows the A.I. to identify which audio waveforms, the word sounds, go with which bits of text. Once the algorithm has been trained, we can then send audio files to the algorithms and it makes it’s best guess as to which word every waveform corresponds to.

A.I. algorithms are very sensitive to what they’ve been trained on. The further you get away from what they’ve been trained on, the more inaccurate they are. For example, you can’t use an English A.I. to transcribe Spanish. Likewise, if an A.I. has been trained on perfectly recorded audio with no background noise, as soon as you add in background noise it goes off the rails. In fact, the accuracy of every A.I. eventually falls off a cliff. At that point it’s more work to clean it up than to just transcribe it manually.

Always Use New Files

Any time you submit a file to an A.I. it’s possible that the A.I. learns from that file. So you really don’t want to use the same file over and over and over again. To ensure you’re getting unbiased results it’s best to use new files every time you test.

Keep The Test Files Short

First off, comparing transcripts is tedious. Short transcripts are better than long ones. Secondly, if the two minutes you select is representative of an hour long clip, that’s all you need. Transcribing and comparing the entire hour won’t tell you anything more about the accuracy. The accuracy of two minutes is usually the same as the accuracy of the hour.

Of course, if you’re interviewing many different people over that hour in different locations, with different audio quality (lots of background noise, no background noise, some with accents, etc)… two minutes won’t be representative of the entire hour.

Chose Files of Varying Quality

This is critical! You have to choose files that are representative of the files you’ll be transcribing. Test files with different levels of background noise, different speakers, different accents, different jargon… whatever issues usually occur in the dialog typically in your videos. ** This is how you’ll determine what ‘garbage’ means to the A.I. **

Use Human Transcripts for The ‘Test Master’

Send out the files to get transcribed by a person. And then have someone within your org (or you) go over them for errors. There usually are some, especially when it comes to jargon or names (turns out humans aren’t perfect either! I know… shocker.). These transcripts will be the what you compare the A.I. transcripts against, so they need to be close to perfect.  If you change something after you start testing, you need to re-test the transcripts you’ve already tested.

Create A Set of Rules And Apply Them Consistently

You need to figure out what you consider one error, a 1/2 error or two errors. In most cases it doesn’t matter exactly what you decide to do, only that you do it consistently. If a missing comma is 1/2 an error, great! But it ALWAYS has to be a 1/2 error. You can’t suddenly make it a full error just because you think it’s particularly egregious. You want to remove judgement out of the equation as much as possible. If you’re making judgement calls, it’s likely you’ll choose the A.I. that most resembles how you see the world. That may not be the best A.I. for your customers. (OMG… they used an Oxford Comma! I hate Oxford commas! That’s at least TWO errors!).

And NOW… The Moment You’ve ALL Been Waiting For…

Add up the errors, divide that by the number of words, put everything into a spreadsheet… and you’ve got your winner!

It’s a bit tedious to do the testing, but being able to identify the most accurate service will save you a lot of cleanup time in the long run. So it really is time well spent.

Hopefully this post has given you some insights into how to test whatever type of A.I. services you’re looking into using. And, of course, if you haven’t checked out Transcriptive, our A.I. transcript plugin for Premiere Pro, you need to!Thanks for reading and please feel free to ask questions in the comment section below!

Artificial Intelligence Gone Bad

There are plenty of horrible things A.I. might be able to do in the future. And this MIT article lists six potential problem areas in the very near future, which are legit to varying degrees. (Although, this is more a list of humans behaving badly than A.I. per se)

However, most people don’t realize exactly how rudimentary (i.e. dumb) A.I. is in it’s current state. This is part of the problem with the MIT list.  The technology is prone to biases, many false positives, difficulty with simple situations, etc., etc.  The problem is more humans trying to make use of and/or make critical decisions based on immature technology.

For those of us that work with it regularly, we see all the limitations on a daily basis, so the idea of A.I. taking over the world is a bit laughable. In fact,  you can see it daily yourself on your phone.

Take the auto-suggest feature on the iPhone. You would think the Natural Language Processing could take a phrase like ‘Glad you’re feeling b…’ and suggest things like better, beautiful or whatever. Not so hard, right?

Er, no.

When artificial intelligence can't handle basic things

How often does ‘glad’, ‘feeling’ and ‘bad’ appear in the same sentence? And you want to let A.I. drive your car?

We’ve got a ways to go.

Unless, of course, it’s a human problem again and there are a bunch of asshats out there that are glad you’re feeling bad. Oh, wait… it’s the internet. Right.

Downloading The Captions Facebook or YouTube Creates

So you’ve uploaded your video to Facebook or YouTube and you’d like to import the captions they automatically generate with Artificial Intelligence into Transcriptive. This can be a good, FREE way of getting a transcript.

Transcriptive imports SRT files, so… all you need is an SRT file from those services. That’s easy peasy with YouTube, you just go to the Captions section and download>SRT.

Screenshot of where to download an SRT file of YouTube CaptionsDownload the SRT and you’re done. Import the SRT into Transcriptive with ‘Combine Lines into Paragraphs’ turned on… Easy, free transcription.

With Facebook it’s more difficult as they don’t let you just download an SRT file. Or any file for that matter. So you need to get tricky.

Open Facebook in Firefox and go to the Web Developer>Network. This will open the inspector at the bottom of you browser window.

Firefox's web developer tool, the Network tabWhich will give you something that looks like this:

Using the Network tab to get a Facebook caption fileGo to the Facebook video you want to get the caption file for.

Once the video starts playing, type SRT into the Filter field (as shown above)

This _should_ show an XHR file. (we’ve seen instances where it doesn’t, not sure why. So this might not work for every video)

Right Click on it, select Copy>Copy URL (as shown above)

Open a new Tab and paste in the URL.

You should now be asked to download a file. Save this as an SRT file (e.g. MyVideo.srt).

Import the SRT into Transcriptive with ‘Combine Lines into Paragraphs’ turned on… Easy, free transcription.

So that’s it. This worked as of this writing. It’s entirely possible Facebook will make a change at some point preventing this, but for now, it’s a good way of getting free transcriptions.

You can also do this in other browsers, I’m just using Firefox as an example.

Artificial Intelligence is The New VR

Couple things stood out to me at NAB.

1) Practically every company exhibiting was talking about A.I.-something.

2) VR seemed to have disappeared from vendor booths.

The last couple years at NAB, VR was everywhere. The Dell booth had a VR simulator, Intel had a VR simulator, booths had Oculuses galore and you could walk away with an armful of cardboard glasses… this year, not so much. Was it there? Sure, but it was hardly to be seen in booths. It felt like the year 3D died. There was a pavilion, there were sessions, but nobody on the show floor was making a big deal about it.

In contrast, it seemed like every vendor was trying to attach A.I. to their name, whether they had an A.I. product or not. Not to mention, Google, Amazon, Microsoft, IBM, Speechmatics and every other big vendor of A.I. cloud services having large booths touting how their A.I. was going to change video production forever.

I’ve talked before about the limitations of A.I. and I think a lot of what was talked about at NAB was really over promising what A.I. can do. We spent most of the six months after releasing Transcriptive 1.0 developing non-A.I. features to help make the A.I. portion of the product more useful. The release were announcing today and the next release coming later this month will focus on getting around A.I. transcripts completely by importing human transcripts.

There’s a lot of value in A.I. It’s an important part of Transcriptive and for a lot use cases it’s awesome. There are just also a lot of limitations.  It’s pretty common that you run into the A.I. equivalent of the Uncanny Valley (a CG character that looks *almost* human but ends up looking unnatural and creepy), where A.I. gets you 95% of the way there but it’s more work than it’s worth to get the final 5%. It’s better to just not use it.

You just have to understand when that 95% makes your life dramatically easier and when it’s like running into a brick wall. Part of my goal, both as a product designer and just talking about it, is to help folks understand where that line in the A.I. sand is.

I also don’t buy into this idea that A.I. is on an exponential curve and it’s just going to get endlessly better, obeying Moore’s law like the speed of processors.

When we first launched Transcriptive, we felt it would replace transcriptionists. We’ve been disabused of that notion. ;-) The reality is that A.I. is making transcriptionists more efficient. Just as we’ve found Transcriptive to be making video editors more efficient. We had a lot of folks coming up to us at NAB this year telling us exactly that. (It was really nice to hear. :-)

However, much of the effectiveness of Transcriptive comes more from the tools that we’ve built around the A.I. portion of the product. Those tools can work with transcripts and metadata regardless of whether they’re A.I. or human generated. So while we’re going to continue to improve what you can do with A.I., we’re also supporting other workflows.

Over the next couple months you’re going to see a lot of announcements about Transcriptive. Our goal is to leverage the parts of A.I. that really work for video production by building tools and features that amplify those strengths, like PowerSearch our new panel for searching all the metadata in your Premiere project, and build bridges to other technology that works better in other areas, such as importing human created transcripts.

Should be a fun couple months, stay tuned! btw… if you’re interested in joining the PowerSearch beta, just email us at cs@nulldigitalanarchy.com.

Addendum: Just to be clear, in one way A.I. is definitely NOT VR. It’s actually useful. A.I. has a lot of potential to really change video production, it’s just a bit over-hyped right now. We, like some other companies, are trying to find the best way to incorporate it into our products because once that is figured out, it’s likely to make editors much more efficient and eliminate some tasks that are total drudgery. OTOH, VR is a parlor trick that, other than some very niche uses, is going to go the way of 3D TV and won’t change anything.

Jim Tierney
Chief Executive Anarchist
Digital Anarchy

Just Say No to A.I. Chatbots

For all the developments in artificial intelligence, one of the consistently worst uses of it is with chatbots. Those little ‘Chat With Us’ side bars on many websites. Since we’re doing a lot with artificial intelligence (A.I.) in Transcriptive and in other areas, I’ve gotten very familiar with how it works and what the limitations are. It starts to be easy to spot where it’s being used, especially when it’s used badly.

So A.I. chatbots, which really doesn’t work well, have become a bit of a pet peeve of mine. If you’re thinking about using them for your website, you owe it to yourself to  click around the web and see how often ‘chatting’ gets you a usable answer. It’s usually just frustrating. You go a few rounds with a cheery chatbot before getting to what you were going to do in the first place… send a message that will be replied to by a human. Total waste of time and doesn’t answer the questions.

Artificial intelligence isn't great for chatbotsDo you trust cheery, know-nothing chatbots with your customers?

The main problem is that chatbots don’t know when to quit. I get it that some business receive the same question over and over… where are you located? what are your hours? Ok, fine, have a chatbot act as a FAQ. But the chatbot needs to quickly hand off the conversation to a real person if the questions go beyond what you could have in an FAQ. And frankly, an FAQ would be better than trying to fake-out people with your A.I. chatbot. (honesty and authenticity matter, even on the web)

A.I. is just not great at reading comprehension. It can get the jist of things usually, which I think is useful for analytics and business intelligence. But this doesn’t allow it to respond with any degree of accuracy or intelligence. For responding to customer queries it produces answers that are sort of close… but mostly unusable. So, the result is frustrated customers.

Take a recent experience with Audi. I’m looking at buying a new car and am interested in one of their SUVs. I went onto an Audi dealer site to inquire about a used one they had. I wanted to know 1) was it actually in stock and 2) how much of the original warranty was left since it was a 2017? There was a button to send a message which I was originally going to use but decided to try the chat button that was bouncing up and down getting my attention.

So, I asked those questions in the chat. If it had been a real person, they definitely could have answered #1 and probably #2, even if they were just an assistant. But no, I ended in the same place I would’ve been if I’d just clicked ‘send a message’ in the first place. But first, I had to get through a bunch of generic answers that didn’t answer any of my questions and just dragged me around in circles. This is not a good way to deal with customers if you’re trying to sell them a $40,000 car.

And don’t get me started on Amazon’s chatbots. (and emailbots for that matter)

It’s also funny to notice how the chatbots try and make you think it’s human, with misspelled words and faux emotions. I’ve had a chatbot admonish me with ‘I’m a real person…’ when I called it a chatbot. It then followed that with another generic answer that didn’t address my question. The Pinocchio chatbot… You’re not a real boy, not a real person and you don’t get to pass Go and collect $200. (The real salesperson I eventually talked to confirmed it was a chatbot.)

I also had one threaten to end the chat if I didn’t watch my language, which was not aimed at the chatbot. I just said, “I just want this to f’ing work”. A little generic frustration. However, after it told me to watch my language, I went from frustrated to kind of pissed. So much for artificial intelligence having emotional intelligence. Getting faux-insulted over something almost any real human would recognize as low grade frustration, is not going to make customers happier.

I think A.I. has some amazing uses, Transcriptive makes great use of A.I. but it also has a LOT of shortcomings. All of those shortcomings are glaringly apparent when you look at chatbots. There are, of course, many companies trying to create conversational A.I. but so far the results have been pretty poor.

Based on what I’ve seen developing products with A.I., I think it’s likely it’ll be quite a while before conversational A.I. is a good experience on a regular basis. You should think very hard about entrusting your customers to it. A web form or FAQ is going to be better than a frustrating experience with a ‘sales person’.

Not sure what this has to do with video editing. Perhaps just another example of why A.I. is going to have a hard time editing anything that requires comprehending the content. Furthering my belief that A.I. isn’t going to replace most video editors any time soon.

How Doc Filmmakers Are using A.I. to Create Captions and Search Footage in Premiere Pro

Artificial Intelligence (A.I.) and machine learning are changing how video editors deal with some common problems. 1) how do you get accurate transcriptions for captions or subtitles? And 2) how do you find something in hours of footage if you don’t know exactly where it is?

Getting out of the Transcription Dungeon

Kelley Slagle, director, producer and editor for Cavegirl Productions, has been working on Eye of the Beholder, a documentary on the artists that created the illustrations for the Dungeons and Dragon game. With over 40 hours of interview footage to comb through searching through it all has been made much easier by Transcriptive, a new A.I. plugin for Adobe Premiere Pro.


eye-beholder 

Why Transcribe?

Imagine having Google for your video project. Turning all the dialog into text makes everything easily searchable (and it supports 28 languages). Not too mention making it easy to create captions and subtitles.

The Dragon of Time And Money

Using a traditional transcription service for 40 hours of footage, you’re looking at a minimum of $2400 and a few days to turn it all around. Not exactly cost or time effective. Especially if you’re on a doc budget. However, it’s a problem for all of us.

Transcriptive helps solve the transcription problem, and the problems of searching video and captions/subtitles. It uses A.I. and machine learning to automatically generate transcripts with up to 95% accuracy and bring them into Premiere Pro. And the cost? About $4/hour (or much less depending on the options you choose) So, 40 hours is $160 vs $2400. And you’ll get all of it back in a few hours.

Yeah, it’s hard to believe.

Read what these three filmmakers have to say and try the Transcriptive demo out on your own footage. It’ll make it much easier to believe.

 

“We are using Transcriptive to transcribe all of our interviews for EYE OF THE BEHOLDER. The idea of paying a premium for that much manual transcription was daunting. I am in the editing phase now and we are collaborating with a co-producer in New York. We need to share our ideas for edits and content with him, so he is reviewing transcripts generated by Transcriptive and sending us his feedback and vice versa. The ability to get a mostly accurate transcription is fine for us, as we did not expect the engine to know proper names of characters and places in Dungeons & Dragons.” – Kelley Slagle, Cavegirl Productions

Google Your Video Clips and Premiere Project?

 

Since everything lives right within Premiere, all the dialog is fully searchable. It’s basically a word processor designed for transcripts, where every word has time code. Yep, every word of dialog has time code. Click on the word and jump to that point on the timeline. This means you don’t have to scrub through footage to find something. Search and jump right to it. It’s an amazing way for an editor to find any quote or quip.

As Kelley says, “We are able to find what we need by searching the text or searching the metadata thanks to the feature of saving the markers in our timelines. As an editor, I am now able to find an exact quote that one of my co-producers refers to, or find something by subject matter, and this speeds up the editing process greatly.”

Joy E. Reed of Oh My! Productions, who’s directing the documentary, ‘Ren and Luca’ adds, “We use sequence markers to mark up our interviews, so when we’re searching for specific words/phrases, we can find them and access them nearly instantly. Our workflow is much smoother once we’ve incorporated the Transcriptive markers into our project. We now keep the Markers window open and can hop to our desired areas without having to flip back and forth between our transcript in a text document and Premiere.”

Workflow, Captions, and Subtitles

ren-luca-L

Captions and subtitles are one of the key uses of Transcriptive. You can use it with the Premiere’s captioning tool or export many different file formats (SRT, SMPTE, SCC, MCC, VTT, etc) for use in any captioning application.

“We’re using Transcriptive to transcribe both sit down and on-the-fly interviews with our subjects. We also use it to get transcripts of finished projects to create closed captions/subtitles.”, says Joy. “We can’t even begin to say how useful it has been on Ren and Luca and how much time it saves us. The turnaround time to receive the transcripts is SO much faster than when we sent it out to a service. We’ve had the best luck with Speechmatics. The transcripts are only as accurate as our speakers – we have a teenage boy who tends to mumble, and his stuff has needed more tweaking than some of our other subjects, but it has been great for very clearly recorded material. The time it saves vs the time you need to tweak for errors is significant.”

captions

Transcriptive is fully integrated into Premiere Pro, you never have to leave the application or pass metadata and files around. This makes creating captions much easier, allowing you to easily edit each line while playing back the footage. There are also tools and keyboard shortcuts to make the editing much faster than a normal text editor. You then export everything to Premiere’s caption tool and use that to put on the finishing touches and deliver them with your media.

Another company doing documentary work is Windy Films. They are focused on telling stories of social impact and innovation, and like most doc makers are usually on tight budgets and deadlines. Transcriptive has been critical in helping them tell real stories with real people (with lots of real dialog that needs transcribing).

They recently completed a project for Planned Parenthood. The deadline was incredibly tight. Harvey Burrell, filmmaker at Windy, says, “We were trying to beat the senate vote on the healthcare repeal bill. We were editing while driving back from Iowa to Boston. The fact that we could get transcripts back in a matter of hours instead of a matter of days allowed us to get it done on time. We use Transcriptive for everything. The integration into premiere has been incredible. We’ve been getting transcripts done for a long time. The workflow was always a clunky; particularly to have transcripts in a word document off to one side. Having the ability to click on a word and just have Transcriptive take you there in the timeline is one of our favorite features.”

Getting Accurate Transcripts using A.I.

 

Audio quality matters. So the better the recording and the more the talent enunciates correctly, the better the transcript. You can get excellent results, around 95% accuracy, with very well recorded audio. That means your talent is well mic’d, there’s not a lot of background noise and they speak clearly. Even if you don’t have that, you’ll still usually get very good results as long as the talent is mic’d. Even accents are ok as long as they speak clearly. Talent that’s off mic or if there’s crosstalk will cause it to be less accurate.

6-Full-Screen

Transcriptive lets you sign up with the speech services directly, allowing you to get the best pricing. Most transcription products hide the service they’re using (they’re all using one of the big A.I. services), marking up the cost per minute to as much as .50/min. When you sign up directly, you get Speechmatics for $0.07/min. And Watson gives you the first 1000 minutes free. (Speechmatics is much more accurate but Watson can be useful).

Transcriptive itself costs $299 when you check out of the Digital Anarchy store. A web version is coming soon as well. To try transcribing with Transcriptive you can download the trial version here. (remember, Speechmatics is the more accurate service and the only service available in the demo) Reach out to sales@nulldigitalanarchy.com if you have questions or want an extended trial.

Transcriptive is a plugin that many didn’t know they were waiting for. It is changing the workflow of many editors in the industry. See for yourself how we’re transforming the art of transcription.