Tag Archives: A.I.

Dumb A.I., Dumb Anarchist: Using the Transcriptive Glossary

We’ve been working on Transcriptive for like 3 years now. In that time, the A.I. has heard my voice saying ‘Digital Anarchy’ umpteen million times. So, you would think it would easily get that right by now. As the below transcript from our SRT Importing tutorial shows… not so much. (Dugal Accusatorial? Seriously?)

ALSO, you would think that by now I would have a list of terms that I would copy/paste into Transcriptive’s Glossary field every time I get a transcript for a tutorial. The glossary helps the A.I. determine what  ‘vocal sounds’ should be when it translates those sounds into words. Uh, yeah… not so much.

So… don’t be like AnarchyJim. If you have words you know the A.I. probably won’t get: company names, industry jargon, difficult proper names (cool blog post on applying player names to an MLB video here), etc., then use Transcriptive’s glossary (in the Transcribe dialog). It does work. (and somebody should mention that to the guy that designed the product. Oy.)

Use the Glossary field in the Transcribe dialog!Overall the A.I. is really accurate and does usually get ‘Digital Anarchy’ correct. So I get lazy about using the glossary. It is a really useful thing…

A.I. Glossary in Transcriptive

Testing The Accuracy of Artificial Intelligence (A.I.) Services

When A.I. works, it can be amazing. BUT you can waste a lot of time and money when it doesn’t work. Garbage in, garbage out, as they say. But what is ‘garbage’ and how do you know it’s garbage? That’s one of the things, hopefully, I’ll help answer.

Why Even Bother?

It’s a bit tedious to do the testing, but being able to identify the most accurate service will save you a lot of time in the long run. Cleaning up inaccurate transcripts, metadata, or keywords is far more tedious and problematic than doing a little testing up front. So it really is time well spent.

One caveat… There’s a lot of potential ways to use A.I., and this is only going to cover Speech-to-Text because that’s what I’m most familiar with due to Transcriptive and getting A.I. transcripts in Premiere. But if you understand how to evaluate one use, you should, more or less, be able to apply your evaluation method to others. (i.e. for testing audio, you want varying audio quality among your samples. If testing images you want varying quality (low light, blurriness, etc) among your samples)

At Digital Anarchy, we’re constantly evaluating a basket of A.I. services to determine what to use on the backend of Transcriptive. So we’ve had to come up with a methodology to fairly test how accurate they are. Most of the people reading this are in a bit different situation… testing solutions from various vendors that use A.I. instead of testing the A.I. directly. However, since different vendors use different A.I. services, this methodology will still be useful for you in comparing the accuracy of the A.I. at the core of the solutions. There may be, of course, other features of a given solution that may affect your decision to go with one or the other, but at least you’ll be able to compare accuracy objectively.

Here’s an outline of our method:

  1. Always use new files that haven’t been processed before by any of the A.I. services.
  2. Keep them short. (1-2min)
  3. Choose files of varying quality.
  4. Use a human transcription service to create the ‘test master’ transcript.
    • Have someone do a second pass to correct any human errors.
  5. Create a set of rules on word/punctuation errors for what counts as an error (or 1/2 or two).
    • If you change them halfway through the test, you need to re-test everything.
  6. Apply them consistently. If something is ambiguous, create a rule for how it will be handled and alway apply it that way.
  7. Compare the results and may the best bot win.

May The Best Bot Win : Visualizing

Accuracy rates for different A.I. services

The main chart compares each engine on a specific file (i.e. File #1, File # 2, etc), using both word and punctuation accuracy. This is really what we use to determine which is best, as punctuation matters. It also shows where each A.I. has strengths and weaknesses. The second, smaller chart shows each service from best result to worst result, using only word accuracy. Every A.I. will eventually fall off a cliff in terms of accuracy. This chart shows you the ‘profile’ for each service and can be a little bit clearer way of seeing which is best overall, ignoring specific files.

First it’s important to understand how A.I. works. Machine Learning is used to ‘train’ an algorithm. Usually millions of bits of data that have been labeled by humans are used to train it. In the case of Speech-to-Text, these bits are audio files with a human transcripts. This allows the A.I. to identify which audio waveforms, the word sounds, go with which bits of text. Once the algorithm has been trained, we can then send audio files to the algorithms and it makes it’s best guess as to which word every waveform corresponds to.

A.I. algorithms are very sensitive to what they’ve been trained on. The further you get away from what they’ve been trained on, the more inaccurate they are. For example, you can’t use an English A.I. to transcribe Spanish. Likewise, if an A.I. has been trained on perfectly recorded audio with no background noise, as soon as you add in background noise it goes off the rails. In fact, the accuracy of every A.I. eventually falls off a cliff. At that point it’s more work to clean it up than to just transcribe it manually.

Always Use New Files

Any time you submit a file to an A.I. it’s possible that the A.I. learns from that file. So you really don’t want to use the same file over and over and over again. To ensure you’re getting unbiased results it’s best to use new files every time you test.

Keep The Test Files Short

First off, comparing transcripts is tedious. Short transcripts are better than long ones. Secondly, if the two minutes you select is representative of an hour long clip, that’s all you need. Transcribing and comparing the entire hour won’t tell you anything more about the accuracy. The accuracy of two minutes is usually the same as the accuracy of the hour.

Of course, if you’re interviewing many different people over that hour in different locations, with different audio quality (lots of background noise, no background noise, some with accents, etc)… two minutes won’t be representative of the entire hour.

Chose Files of Varying Quality

This is critical! You have to choose files that are representative of the files you’ll be transcribing. Test files with different levels of background noise, different speakers, different accents, different jargon… whatever issues usually occur in the dialog typically in your videos. ** This is how you’ll determine what ‘garbage’ means to the A.I. **

Use Human Transcripts for The ‘Test Master’

Send out the files to get transcribed by a person. And then have someone within your org (or you) go over them for errors. There usually are some, especially when it comes to jargon or names (turns out humans aren’t perfect either! I know… shocker.). These transcripts will be the what you compare the A.I. transcripts against, so they need to be close to perfect.  If you change something after you start testing, you need to re-test the transcripts you’ve already tested.

Create A Set of Rules And Apply Them Consistently

You need to figure out what you consider one error, a 1/2 error or two errors. In most cases it doesn’t matter exactly what you decide to do, only that you do it consistently. If a missing comma is 1/2 an error, great! But it ALWAYS has to be a 1/2 error. You can’t suddenly make it a full error just because you think it’s particularly egregious. You want to remove judgement out of the equation as much as possible. If you’re making judgement calls, it’s likely you’ll choose the A.I. that most resembles how you see the world. That may not be the best A.I. for your customers. (OMG… they used an Oxford Comma! I hate Oxford commas! That’s at least TWO errors!).

And NOW… The Moment You’ve ALL Been Waiting For…

Add up the errors, divide that by the number of words, put everything into a spreadsheet… and you’ve got your winner!

It’s a bit tedious to do the testing, but being able to identify the most accurate service will save you a lot of cleanup time in the long run. So it really is time well spent.

Hopefully this post has given you some insights into how to test whatever type of A.I. services you’re looking into using. And, of course, if you haven’t checked out Transcriptive, our A.I. transcript plugin for Premiere Pro, you need to!Thanks for reading and please feel free to ask questions in the comment section below!

Artificial Intelligence Gone Bad

There are plenty of horrible things A.I. might be able to do in the future. And this MIT article lists six potential problem areas in the very near future, which are legit to varying degrees. (Although, this is more a list of humans behaving badly than A.I. per se)

However, most people don’t realize exactly how rudimentary (i.e. dumb) A.I. is in it’s current state. This is part of the problem with the MIT list.  The technology is prone to biases, many false positives, difficulty with simple situations, etc., etc.  The problem is more humans trying to make use of and/or make critical decisions based on immature technology.

For those of us that work with it regularly, we see all the limitations on a daily basis, so the idea of A.I. taking over the world is a bit laughable. In fact,  you can see it daily yourself on your phone.

Take the auto-suggest feature on the iPhone. You would think the Natural Language Processing could take a phrase like ‘Glad you’re feeling b…’ and suggest things like better, beautiful or whatever. Not so hard, right?

Er, no.

When artificial intelligence can't handle basic things

How often does ‘glad’, ‘feeling’ and ‘bad’ appear in the same sentence? And you want to let A.I. drive your car?

We’ve got a ways to go.

Unless, of course, it’s a human problem again and there are a bunch of asshats out there that are glad you’re feeling bad. Oh, wait… it’s the internet. Right.

Using A.I. to Create Music with Ampermusic and Jukedeck

For the last 14 years I’ve created the Audio Art Tour for Burning Man. It’s kind of a docent led audio guide to the major art installations out there, similar to an audio guide you might get at a museum.

Burning Man always has a different ‘theme’ and this year it was ‘I, Robot’. I generally try and find background music related to the theme. EDM is big at Burning Man, land of 10,000 DJs, so I could’ve just grabbed some electronic tracks that sounded robotic. Easy enough to do. However I  decided to let Artificial Intelligence algorithms create the music! (You can listen to the tour and hear the different tracks)

This turned out to be not so easy, so I’ll break down what I had to do to get seven unique sounding, usable tracks. I had a bit more success with AmperMusic, which is also currently free (unlike Jukedeck), so I’ll discuss that first.

Getting the Tracks

The problem with both services was getting unique sound tracks. The A.I. has a tendency of creating very similar sounding music. Even if you select different styles and instruments you often end up with oddly similar music. This problem is compounded by Amper’s inability to render more than about 30 seconds of music.

Using Artificial Intelligence and machine learning to create music

What I found I had to do was let it generate 30 seconds randomly or with me selecting the instruments. I did this repeatedly until I got a 30 second sample I liked. At which point I extended it out to about 3 or 4 minutes and turned off all the instruments but two or three. Amper was usually able to render that out. Then I’d turn off those instruments and turn back on another three. Then render that. Rinse, repeat until you’ve rendered all the instruments.

Now you’ve got a bunch of individual tracks that you can combine to get your final music track. Combine them in Audition or even Premiere Pro (or FCP or whatever NLE) and you’re good to go. I used that technique to get five of the tracks.

Jukedeck didn’t have the rendering problem but it REALLY suffered from the ‘sameness’ problem. It was tough getting something that really sounded unique. However, I did get a couple good tracks out of it.

Problems Using Artificial Intelligence

This is another example of A.I. and Machine Learning that works… sort of. I could have found seven stock music tracks that I like much faster (this is what I usually do for the Audio Art Tour).  The amount of time it took me messing around with these services was significant. Also, if Jukedeck is any indication, a music track from one of these services will cost as much as a stock music track. Just go to Pond5 to see what you can get for the same price. With a much, much wider variety. I don’t think living, breathing musicians have much to worry about. At least for now.

That said, I did manage to get seven unique, cool sounding tracks out of them. It took some work, but it did happen.

As with most A.I./ML, it’s difficult to see what the future looks like. There has certainly been a ton of advances, but I think in a lot of cases, it’s some of the low hanging fruit. We’re seeing that with Speech-to-text algorithms in Transcriptive where they’re starting to plateau and cluster around the same accuracy levels. The fruit (accuracy) is now pretty high up and improvement are tough. It’ll be interesting to see what it takes to break through that. More data? Faster servers? A new approach?

I think music may be similar. It seems like it’s a natural thing for A.I. but it’s deceptively difficult to do in a way that mimics the range and diversity of styles and sounds that many human musicians have. Particularly a human armed with a synth that can reproduce an entire orchestra. We’ll see what it takes to get A.I. music out of the Valley of Sameness.

 

Artificial Intelligence is The New VR

Couple things stood out to me at NAB.

1) Practically every company exhibiting was talking about A.I.-something.

2) VR seemed to have disappeared from vendor booths.

The last couple years at NAB, VR was everywhere. The Dell booth had a VR simulator, Intel had a VR simulator, booths had Oculuses galore and you could walk away with an armful of cardboard glasses… this year, not so much. Was it there? Sure, but it was hardly to be seen in booths. It felt like the year 3D died. There was a pavilion, there were sessions, but nobody on the show floor was making a big deal about it.

In contrast, it seemed like every vendor was trying to attach A.I. to their name, whether they had an A.I. product or not. Not to mention, Google, Amazon, Microsoft, IBM, Speechmatics and every other big vendor of A.I. cloud services having large booths touting how their A.I. was going to change video production forever.

I’ve talked before about the limitations of A.I. and I think a lot of what was talked about at NAB was really over promising what A.I. can do. We spent most of the six months after releasing Transcriptive 1.0 developing non-A.I. features to help make the A.I. portion of the product more useful. The release were announcing today and the next release coming later this month will focus on getting around A.I. transcripts completely by importing human transcripts.

There’s a lot of value in A.I. It’s an important part of Transcriptive and for a lot use cases it’s awesome. There are just also a lot of limitations.  It’s pretty common that you run into the A.I. equivalent of the Uncanny Valley (a CG character that looks *almost* human but ends up looking unnatural and creepy), where A.I. gets you 95% of the way there but it’s more work than it’s worth to get the final 5%. It’s better to just not use it.

You just have to understand when that 95% makes your life dramatically easier and when it’s like running into a brick wall. Part of my goal, both as a product designer and just talking about it, is to help folks understand where that line in the A.I. sand is.

I also don’t buy into this idea that A.I. is on an exponential curve and it’s just going to get endlessly better, obeying Moore’s law like the speed of processors.

When we first launched Transcriptive, we felt it would replace transcriptionists. We’ve been disabused of that notion. ;-) The reality is that A.I. is making transcriptionists more efficient. Just as we’ve found Transcriptive to be making video editors more efficient. We had a lot of folks coming up to us at NAB this year telling us exactly that. (It was really nice to hear. :-)

However, much of the effectiveness of Transcriptive comes more from the tools that we’ve built around the A.I. portion of the product. Those tools can work with transcripts and metadata regardless of whether they’re A.I. or human generated. So while we’re going to continue to improve what you can do with A.I., we’re also supporting other workflows.

Over the next couple months you’re going to see a lot of announcements about Transcriptive. Our goal is to leverage the parts of A.I. that really work for video production by building tools and features that amplify those strengths, like PowerSearch our new panel for searching all the metadata in your Premiere project, and build bridges to other technology that works better in other areas, such as importing human created transcripts.

Should be a fun couple months, stay tuned! btw… if you’re interested in joining the PowerSearch beta, just email us at cs@nulldigitalanarchy.com.

Addendum: Just to be clear, in one way A.I. is definitely NOT VR. It’s actually useful. A.I. has a lot of potential to really change video production, it’s just a bit over-hyped right now. We, like some other companies, are trying to find the best way to incorporate it into our products because once that is figured out, it’s likely to make editors much more efficient and eliminate some tasks that are total drudgery. OTOH, VR is a parlor trick that, other than some very niche uses, is going to go the way of 3D TV and won’t change anything.

Jim Tierney
Chief Executive Anarchist
Digital Anarchy

Just Say No to A.I. Chatbots

For all the developments in artificial intelligence, one of the consistently worst uses of it is with chatbots. Those little ‘Chat With Us’ side bars on many websites. Since we’re doing a lot with artificial intelligence (A.I.) in Transcriptive and in other areas, I’ve gotten very familiar with how it works and what the limitations are. It starts to be easy to spot where it’s being used, especially when it’s used badly.

So A.I. chatbots, which really doesn’t work well, have become a bit of a pet peeve of mine. If you’re thinking about using them for your website, you owe it to yourself to  click around the web and see how often ‘chatting’ gets you a usable answer. It’s usually just frustrating. You go a few rounds with a cheery chatbot before getting to what you were going to do in the first place… send a message that will be replied to by a human. Total waste of time and doesn’t answer the questions.

Artificial intelligence isn't great for chatbotsDo you trust cheery, know-nothing chatbots with your customers?

The main problem is that chatbots don’t know when to quit. I get it that some business receive the same question over and over… where are you located? what are your hours? Ok, fine, have a chatbot act as a FAQ. But the chatbot needs to quickly hand off the conversation to a real person if the questions go beyond what you could have in an FAQ. And frankly, an FAQ would be better than trying to fake-out people with your A.I. chatbot. (honesty and authenticity matter, even on the web)

A.I. is just not great at reading comprehension. It can get the jist of things usually, which I think is useful for analytics and business intelligence. But this doesn’t allow it to respond with any degree of accuracy or intelligence. For responding to customer queries it produces answers that are sort of close… but mostly unusable. So, the result is frustrated customers.

Take a recent experience with Audi. I’m looking at buying a new car and am interested in one of their SUVs. I went onto an Audi dealer site to inquire about a used one they had. I wanted to know 1) was it actually in stock and 2) how much of the original warranty was left since it was a 2017? There was a button to send a message which I was originally going to use but decided to try the chat button that was bouncing up and down getting my attention.

So, I asked those questions in the chat. If it had been a real person, they definitely could have answered #1 and probably #2, even if they were just an assistant. But no, I ended in the same place I would’ve been if I’d just clicked ‘send a message’ in the first place. But first, I had to get through a bunch of generic answers that didn’t answer any of my questions and just dragged me around in circles. This is not a good way to deal with customers if you’re trying to sell them a $40,000 car.

And don’t get me started on Amazon’s chatbots. (and emailbots for that matter)

It’s also funny to notice how the chatbots try and make you think it’s human, with misspelled words and faux emotions. I’ve had a chatbot admonish me with ‘I’m a real person…’ when I called it a chatbot. It then followed that with another generic answer that didn’t address my question. The Pinocchio chatbot… You’re not a real boy, not a real person and you don’t get to pass Go and collect $200. (The real salesperson I eventually talked to confirmed it was a chatbot.)

I also had one threaten to end the chat if I didn’t watch my language, which was not aimed at the chatbot. I just said, “I just want this to f’ing work”. A little generic frustration. However, after it told me to watch my language, I went from frustrated to kind of pissed. So much for artificial intelligence having emotional intelligence. Getting faux-insulted over something almost any real human would recognize as low grade frustration, is not going to make customers happier.

I think A.I. has some amazing uses, Transcriptive makes great use of A.I. but it also has a LOT of shortcomings. All of those shortcomings are glaringly apparent when you look at chatbots. There are, of course, many companies trying to create conversational A.I. but so far the results have been pretty poor.

Based on what I’ve seen developing products with A.I., I think it’s likely it’ll be quite a while before conversational A.I. is a good experience on a regular basis. You should think very hard about entrusting your customers to it. A web form or FAQ is going to be better than a frustrating experience with a ‘sales person’.

Not sure what this has to do with video editing. Perhaps just another example of why A.I. is going to have a hard time editing anything that requires comprehending the content. Furthering my belief that A.I. isn’t going to replace most video editors any time soon.

Getting transcripts for Premiere Multicam Sequences

Using Transcriptive with multicam sequences is not a smooth process and doesn’t really work. It’s something we’re working on coming up with a solution for but it’s tricky due to Premiere’s limitations.

However, while we sort that out, here’s a workaround that is pretty easy to implement. Here are the steps:

1- Take the clip with the best audio and drop it into it’s own sequence.
Using A.I. to transcribe Premiere Multicam Sequences
2- Transcribe that sequence with Transcriptive.
3- Now replace that clip with the multicam clip.
Transcribing multicam in Adobe premiere pro

4- Voila! You have a multicam sequence with a transcript. Edit the transcript and clip as you normally would.

This is not a permanent solution and we hope to make it much more automatic to deal with Premiere’s multicam clips. In the meantime, this technique will let you get transcripts for multicam clips.

Thanks to Todd Drezner at Cohn Creative for suggesting this workaround.