Category Archives: A.I.

Transcriptive: Beyond automated video transcriptions

I decided to try Transcriptive way before I became part of the Digital Anarchy family. Just like any other aspiring documentary filmmaker, I knew relying on a crew to get my editing started was not an option. Without funding you can’t pay a crew; without a crew you can’t get funding. I had no money, an idea in my head, some footage shot with the help of friends, and a lot of work to do. Especially when working on your very first feature film.

Besides being an independent  Filmmaker and Social Media strategist for DA, I am also an Assistive Technology Trainer for a private company called Adaptive Technology Services. I teach blind and low vision individuals how to take advantage of technology to use their phones and computers to rejoin the workforce after their vision loss. Since the beginning of my journey as an AT Trainer – I started as a volunteer  6 years ago – I have been using my work to research the subject and prepare for this film.

IMG_5515

My movie is about the relationship between the sighted and non-sighted communities. It seeks to establish a dialog between people with and without visual disabilities so we can come together to demystify disabilities to those without them.  I know it is an important subject, but right from the beginning of this project I learned how hard it is to gather funds for any disability-related initiative. I had to carefully budget the shoots and define priorities. Paying a post-production crew was not (and still is not) possible. I have to write and cut samples on my own for now. Transcriptive was a way for me to get things moving by myself  so I can apply for grants in the near future and start paying producers, editors, camera operators, sound designers, and get the project going for real. The journey started with transcribing the interviews. Transcriptive did a pretty good job with transcribe the audio from the camera as you can see below. Accuracy got even better when transcribing audio from the mic.

The idea of getting accurate automated transcripts brought a smile to my face. But could Artificial Intelligence really get the job done for me? I never believed so, and I was right. The accuracy for English interviews was pretty impressive. I barely had to do any editing on those. The situation changed as soon as I tried transcribing audio in my native language, Brazilian Portuguese. The AI transcription didn’t just get a bit flaky; it was completely unusable so I decided to do not waste more time and start doing my manual transcriptions.

I have been using Speechmatics for most of my projects because the accuracy is considerably higher than Watson  with English. However, after trying to transcribe in Portuguese for the first time, it occurred to me Speechmatics actually offers Portuguese from Portugal while Watson transcribes Portuguese from Brazil. I decided to give Watson a try, but the transcription was not much better than the one I got from Speechmatics.

It is true the Brazilian Portuguese footage I was transcribing was b-roll clips recorded with a Rhode mic; placed on top of my DSLR.  They were not well mic’d sit down interviews. The clips do have decent audio, but also involve some background noise that does not help foreign language speech-to-text conversion. At the time I had a deadline to match and was not able to record better audio and compare Speechmatics and Watson Portuguese transcripts. Will be interesting to give it another try, with more time to further compare and evaluate if there are advantages on using Watson for my next batch of footage.

Sample1Timeline

Days after my failed attempt to transcribe Brazilian Portuguese with Speechmatics, I went back to the Transcriptive panel for Premiere, found an option to import my human transcripts, gave it a try, and realized I could still use Transcriptive to speed up my video production workflow. I could still save time by letting Transcriptive assign timecode to the words I transcribed, which would be nearly impossible for me to do on my own. The plugin allowed me to quickly find where things were said in 8 hours of interviews. Having the timecode assigned to each word allowed me to easily search the transcript and jump to that point in my video where I wanted to have a cut, marker, b-roll or transition effect applied.

My movie is still in pre-production and my Premiere project is honestly not that organized yet so the search capability was also a huge advantage. I have been working on samples to apply for grants, which means I have tons of different sequences, multicam sequences, markers that now live in folders inside of folders. Before I started working for DA I was looking for a solution to minimize the mess without having to fully organize it or spend too much money and Power Search came to the rescue.  Also, being able to edit my transcripts inside of Premiere made my life a lot easier.

Last month, talking to a few film clients and friends, I found out most filmmakers still clean up human transcripts. In my case,  I go through the transcripts to add punctuation marks and other things that will remind me how eloquent speakers were in that phrase. Ellipses, question marks and exclamation points remind me of the tone they spoke allowing me to get paper cuts done faster.  I am not sure ASR technology will start entering punctuation in the future, but it would be very handy to me. While this is not a possibility, I am grateful Transcriptive now offers a text edit interface, so I can edit my transcripts without leaving Premiere.

For the movie I am making now I was lucky enough to have a friend willing to help me getting this tedious and time-consuming part of the work done so I am now exporting all my transcripts to Transcriptive.com.  The app will allow us to collaborate on the transcript. She will be helping me all the way from LA, editing all the Transcripts without having to download a whole Premiere project to get the work done.

Curious to see if Transcriptive can speed up your video production workflow? Get a Free trial of Transcriptive and PowerSearch Windows or Mac  and test it yourself!

Using A.I. to Create Music with Ampermusic and Jukedeck

For the last 14 years I’ve created the Audio Art Tour for Burning Man. It’s kind of a docent led audio guide to the major art installations out there, similar to an audio guide you might get at a museum.

Burning Man always has a different ‘theme’ and this year it was ‘I, Robot’. I generally try and find background music related to the theme. EDM is big at Burning Man, land of 10,000 DJs, so I could’ve just grabbed some electronic tracks that sounded robotic. Easy enough to do. However I  decided to let Artificial Intelligence algorithms create the music! (You can listen to the tour and hear the different tracks)

This turned out to be not so easy, so I’ll break down what I had to do to get seven unique sounding, usable tracks. I had a bit more success with AmperMusic, which is also currently free (unlike Jukedeck), so I’ll discuss that first.

Getting the Tracks

The problem with both services was getting unique sound tracks. The A.I. has a tendency of creating very similar sounding music. Even if you select different styles and instruments you often end up with oddly similar music. This problem is compounded by Amper’s inability to render more than about 30 seconds of music.

Using Artificial Intelligence and machine learning to create music

What I found I had to do was let it generate 30 seconds randomly or with me selecting the instruments. I did this repeatedly until I got a 30 second sample I liked. At which point I extended it out to about 3 or 4 minutes and turned off all the instruments but two or three. Amper was usually able to render that out. Then I’d turn off those instruments and turn back on another three. Then render that. Rinse, repeat until you’ve rendered all the instruments.

Now you’ve got a bunch of individual tracks that you can combine to get your final music track. Combine them in Audition or even Premiere Pro (or FCP or whatever NLE) and you’re good to go. I used that technique to get five of the tracks.

Jukedeck didn’t have the rendering problem but it REALLY suffered from the ‘sameness’ problem. It was tough getting something that really sounded unique. However, I did get a couple good tracks out of it.

Problems Using Artificial Intelligence

This is another example of A.I. and Machine Learning that works… sort of. I could have found seven stock music tracks that I like much faster (this is what I usually do for the Audio Art Tour).  The amount of time it took me messing around with these services was significant. Also, if Jukedeck is any indication, a music track from one of these services will cost as much as a stock music track. Just go to Pond5 to see what you can get for the same price. With a much, much wider variety. I don’t think living, breathing musicians have much to worry about. At least for now.

That said, I did manage to get seven unique, cool sounding tracks out of them. It took some work, but it did happen.

As with most A.I./ML, it’s difficult to see what the future looks like. There has certainly been a ton of advances, but I think in a lot of cases, it’s some of the low hanging fruit. We’re seeing that with Speech-to-text algorithms in Transcriptive where they’re starting to plateau and cluster around the same accuracy levels. The fruit (accuracy) is now pretty high up and improvement are tough. It’ll be interesting to see what it takes to break through that. More data? Faster servers? A new approach?

I think music may be similar. It seems like it’s a natural thing for A.I. but it’s deceptively difficult to do in a way that mimics the range and diversity of styles and sounds that many human musicians have. Particularly a human armed with a synth that can reproduce an entire orchestra. We’ll see what it takes to get A.I. music out of the Valley of Sameness.

 

Artificial Intelligence is The New VR

Couple things stood out to me at NAB.

1) Practically every company exhibiting was talking about A.I.-something.

2) VR seemed to have disappeared from vendor booths.

The last couple years at NAB, VR was everywhere. The Dell booth had a VR simulator, Intel had a VR simulator, booths had Oculuses galore and you could walk away with an armful of cardboard glasses… this year, not so much. Was it there? Sure, but it was hardly to be seen in booths. It felt like the year 3D died. There was a pavilion, there were sessions, but nobody on the show floor was making a big deal about it.

In contrast, it seemed like every vendor was trying to attach A.I. to their name, whether they had an A.I. product or not. Not to mention, Google, Amazon, Microsoft, IBM, Speechmatics and every other big vendor of A.I. cloud services having large booths touting how their A.I. was going to change video production forever.

I’ve talked before about the limitations of A.I. and I think a lot of what was talked about at NAB was really over promising what A.I. can do. We spent most of the six months after releasing Transcriptive 1.0 developing non-A.I. features to help make the A.I. portion of the product more useful. The release were announcing today and the next release coming later this month will focus on getting around A.I. transcripts completely by importing human transcripts.

There’s a lot of value in A.I. It’s an important part of Transcriptive and for a lot use cases it’s awesome. There are just also a lot of limitations.  It’s pretty common that you run into the A.I. equivalent of the Uncanny Valley (a CG character that looks *almost* human but ends up looking unnatural and creepy), where A.I. gets you 95% of the way there but it’s more work than it’s worth to get the final 5%. It’s better to just not use it.

You just have to understand when that 95% makes your life dramatically easier and when it’s like running into a brick wall. Part of my goal, both as a product designer and just talking about it, is to help folks understand where that line in the A.I. sand is.

I also don’t buy into this idea that A.I. is on an exponential curve and it’s just going to get endlessly better, obeying Moore’s law like the speed of processors.

When we first launched Transcriptive, we felt it would replace transcriptionists. We’ve been disabused of that notion. ;-) The reality is that A.I. is making transcriptionists more efficient. Just as we’ve found Transcriptive to be making video editors more efficient. We had a lot of folks coming up to us at NAB this year telling us exactly that. (It was really nice to hear. :-)

However, much of the effectiveness of Transcriptive comes more from the tools that we’ve built around the A.I. portion of the product. Those tools can work with transcripts and metadata regardless of whether they’re A.I. or human generated. So while we’re going to continue to improve what you can do with A.I., we’re also supporting other workflows.

Over the next couple months you’re going to see a lot of announcements about Transcriptive. Our goal is to leverage the parts of A.I. that really work for video production by building tools and features that amplify those strengths, like PowerSearch our new panel for searching all the metadata in your Premiere project, and build bridges to other technology that works better in other areas, such as importing human created transcripts.

Should be a fun couple months, stay tuned! btw… if you’re interested in joining the PowerSearch beta, just email us at cs@nulldigitalanarchy.com.

Addendum: Just to be clear, in one way A.I. is definitely NOT VR. It’s actually useful. A.I. has a lot of potential to really change video production, it’s just a bit over-hyped right now. We, like some other companies, are trying to find the best way to incorporate it into our products because once that is figured out, it’s likely to make editors much more efficient and eliminate some tasks that are total drudgery. OTOH, VR is a parlor trick that, other than some very niche uses, is going to go the way of 3D TV and won’t change anything.

Jim Tierney
Chief Executive Anarchist
Digital Anarchy

Just Say No to A.I. Chatbots

For all the developments in artificial intelligence, one of the consistently worst uses of it is with chatbots. Those little ‘Chat With Us’ side bars on many websites. Since we’re doing a lot with artificial intelligence (A.I.) in Transcriptive and in other areas, I’ve gotten very familiar with how it works and what the limitations are. It starts to be easy to spot where it’s being used, especially when it’s used badly.

So A.I. chatbots, which really doesn’t work well, have become a bit of a pet peeve of mine. If you’re thinking about using them for your website, you owe it to yourself to  click around the web and see how often ‘chatting’ gets you a usable answer. It’s usually just frustrating. You go a few rounds with a cheery chatbot before getting to what you were going to do in the first place… send a message that will be replied to by a human. Total waste of time and doesn’t answer the questions.

Artificial intelligence isn't great for chatbotsDo you trust cheery, know-nothing chatbots with your customers?

The main problem is that chatbots don’t know when to quit. I get it that some business receive the same question over and over… where are you located? what are your hours? Ok, fine, have a chatbot act as a FAQ. But the chatbot needs to quickly hand off the conversation to a real person if the questions go beyond what you could have in an FAQ. And frankly, an FAQ would be better than trying to fake-out people with your A.I. chatbot. (honesty and authenticity matter, even on the web)

A.I. is just not great at reading comprehension. It can get the jist of things usually, which I think is useful for analytics and business intelligence. But this doesn’t allow it to respond with any degree of accuracy or intelligence. For responding to customer queries it produces answers that are sort of close… but mostly unusable. So, the result is frustrated customers.

Take a recent experience with Audi. I’m looking at buying a new car and am interested in one of their SUVs. I went onto an Audi dealer site to inquire about a used one they had. I wanted to know 1) was it actually in stock and 2) how much of the original warranty was left since it was a 2017? There was a button to send a message which I was originally going to use but decided to try the chat button that was bouncing up and down getting my attention.

So, I asked those questions in the chat. If it had been a real person, they definitely could have answered #1 and probably #2, even if they were just an assistant. But no, I ended in the same place I would’ve been if I’d just clicked ‘send a message’ in the first place. But first, I had to get through a bunch of generic answers that didn’t answer any of my questions and just dragged me around in circles. This is not a good way to deal with customers if you’re trying to sell them a $40,000 car.

And don’t get me started on Amazon’s chatbots. (and emailbots for that matter)

It’s also funny to notice how the chatbots try and make you think it’s human, with misspelled words and faux emotions. I’ve had a chatbot admonish me with ‘I’m a real person…’ when I called it a chatbot. It then followed that with another generic answer that didn’t address my question. The Pinocchio chatbot… You’re not a real boy, not a real person and you don’t get to pass Go and collect $200. (The real salesperson I eventually talked to confirmed it was a chatbot.)

I also had one threaten to end the chat if I didn’t watch my language, which was not aimed at the chatbot. I just said, “I just want this to f’ing work”. A little generic frustration. However, after it told me to watch my language, I went from frustrated to kind of pissed. So much for artificial intelligence having emotional intelligence. Getting faux-insulted over something almost any real human would recognize as low grade frustration, is not going to make customers happier.

I think A.I. has some amazing uses, Transcriptive makes great use of A.I. but it also has a LOT of shortcomings. All of those shortcomings are glaringly apparent when you look at chatbots. There are, of course, many companies trying to create conversational A.I. but so far the results have been pretty poor.

Based on what I’ve seen developing products with A.I., I think it’s likely it’ll be quite a while before conversational A.I. is a good experience on a regular basis. You should think very hard about entrusting your customers to it. A web form or FAQ is going to be better than a frustrating experience with a ‘sales person’.

Not sure what this has to do with video editing. Perhaps just another example of why A.I. is going to have a hard time editing anything that requires comprehending the content. Furthering my belief that A.I. isn’t going to replace most video editors any time soon.

Artificial Intelligence vs. Video Editors

With Transcriptive, our new tool for doing automated transcriptions, we’ve dove into the world of A.I. headfirst. So I’m pretty familiar with where the state of industry is right now. We’ve been neck deep in it for the last year.

A.I. is definitely changing how editors get transcripts and search video for content. Transcriptive demonstrates that pretty clearly with text.  Searching via object recognition is something that also is already happening. But what about actual video editing?

One of the problems A.I. has is finishing. Going the last 10% if you will. For example, speech-to-text engines, at best, have an accuracy rate of about 95% or so. This is about on par with the average human transcriptionist. For general purpose recordings, human transcriptionists SHOULD be worried.

But for video editing, there are some differences, which are good news. First, and most importantly, errors tend to be cumulative. So if a computer is going to edit a video, at the very least, it needs to do the transcription and it needs to recognize the imagery. (we’ll ignore other considerations like style, emotion, story for the moment) Speech recognition is at best 95%, object recognition is worse. The more layers of AI you have, usually those errors will multiply (in some cases there might be improvement though) . While it’s possible automation will be able to produce a decent rough cut, these errors make it difficult to see automation replacing most of the types of videos that pro editors are typically employed for.

Secondly, if the videos are being done for humans, frequently the humans don’t know what they want. Or at least they’re not going to be able to communicate it in such a way that a computer will understand and be able to make changes. If you’ve used Alexa or Echo, you can see how well A.I. understands humans. Lots of situations, especially literal ones (find me the best restaurant), it works fine, lots of other situations, not so much.

Many times as an editor, the direction you get from clients is subtle or you have to read between the lines and figure out what they want. It’s going to be difficult to get A.I.s to take the way humans usually describe what they want, figure out what they actually want and make those changes.

Third… then you get into the whole issue of emotion and storytelling, which I don’t think A.I. will do well anytime soon. The Economist recently had an amusing article where it let an A.I. write the article. The result is here. Very good at mimicking the style of the Economist but when it comes to putting together a coherent narrative… ouch.

It’s Not All Good News

There are already phone apps that do basic automatic editing. These are more for consumers that want something quick and dirty. For most of the type of stuff professional editors get paid for, it’s unlikely what I’ve seen from the apps will replace humans any time soon. Although, I can see how the tech could be used to create rough cuts and the like.

Also, for some types of videos, wedding or music videos perhaps, you can make a pretty solid case that A.I. will be able to put something together soon that looks reasonably professional.

You need training material for neural networks to learn how to edit videos. Thanks to YouTube, Vimeo and the like, there is an abundance of training material. Do a search for ‘wedding video’ on YouTube. You get 52,000,000 results. 2.3 million people get married in the US every year. Most of the videos from those weddings are online. I don’t think finding a few hundred thousand of those that were done by a professional will be difficult. It’s probably trivial actually.

Same with music videos. There IS enough training material for the A.I.s to learn how to do generic editing for many types of videos.

For people that want to pay $49.95 to get their wedding video edited, that option will be there. Probably within a couple years. Have your guests shoot video, upload it and you’re off and running. You’ll get what you pay for, but for some people it’ll be acceptable. Remember, A.I. is very good at mimicking. So the end result will be a very cookie cutter wedding video. However, since many wedding videos are pretty cookie cutter anyways… at the low end of the market, an A.I. edited video may be all ‘Bridezilla on A Budget’ needs. And besides, who watches these things anyways?

Let The A.I Do The Grunt Work, Not The Editing

The losers in the short term may be assistant editors. Many of the tasks A.I. is good for… transcribing, searching for footage, etc.. is now typically given to assistants. However, it may simply change the types of tasks assistant editors are given. There’s a LOT of metadata that needs to be entered and wrangled.

While A.I. is already showing up in many aspects of video production, it feels like having it actually do the editing is quite a ways off.  I can see creating A.I. tools that help with editing: Rough cut creation, recommending color corrections or B roll selection, suggesting changes to timing, etc. But there’ll still need to be a person doing the edit.