Tag Archives: speech-to-text

Why do we have the lowest transcription costs?

We occasionally get questions from customers asking why we charge .04/min ($2.40/hr) for transcription (if you pre-pay), when some competitors charge .25/min or even .50/min. Is it lower accuracy? Are you selling our data?

No and no. Ok, but why?

Transcriptive and PowerSearch work best when all your media has transcripts attached to it. Our goal is to make Transcriptive as useful as possible. We hope the less you have to think about the cost of the transcripts, the more media you’ll transcribe… resulting in making Transcriptive and PowerSearch that much more powerful.

The Transcriptive-AI service is equal to, or better, than what other services are using. We’re not tied to one A.I. and we’re constantly evaluating the different A.I. services. We use whatever we think is currently state-of-the-art.  Since we do such a high volume we get good pricing from all the services, so it doesn’t really matter which one we use.

Do we make a ton of money on transcribing? No.

The services that charge .25/min (or whatever) are probably making a fair amount of money on transcribing. We’re all paying about .02/min or less. Give or take, that’s the wholesale/volume price.

If you’re getting your transcripts for free… those transcripts are probably being used for training, especially if the service is keeping track of the edits you make (e.g. YouTube, Otter, etc.). Transcriptive is not sending your edits back to the A.I. service. That’s the important bit if you’re going to train the A.I. Without the corrected version, the A.I. doesn’t know what it got wrong and can’t learn from it.

So, for us, it all comes down to making Transcriptive.com, the Transcriptive Premiere Pro panel, and PowerSearch as useful as possible. To do so, we want the most accurate transcripts and we want them to be as low cost as possible. We know y’all have a LOT of footage. We’d rather reduce the barriers to you transcribing all of it.

So… if you’re wondering how we can justify charging .04/min for transcripts, that’s the reason. It enables all the other cool features of Transcriptive and PowerSearch. Hopefully that’s a win for everyone.

A.I. Speech-to-Text: How to make sure your data isn’t being used for training

We get a fair number of questions from Transcriptive users that are concerned the A.I. is going to use their data for training.

First off, in the Transcriptive preferences, if you select ‘Delete transcription jobs from server’ your data is deleted immediately. This will delete everything from the A.I. service’s servers and from the Digital Anarchy servers. So that’s an easy way of making sure your data isn’t kept around and used for anything.

However, generally speaking, the A.I. services don’t get more accurate with user submitted data. Partially because they aren’t getting the ‘positive’ or corrected transcript.

When you edit your transcript we aren’t sending the corrections back to the A.I. (some services are doing this… e.g. if you correct YouTube’s captions, you’re training their A.I.)

So the audio by itself isn’t that useful. What the A.I. needs to learn is the audio file, the original transcript AND the corrected transcript. So even if you don’t have the preference checked, it’s unlikely your audio file will be used for training.

This is great if you’re concerned about security BUT it’s less great if you really WANT the A.I. to learn. For example, I don’t know how many videos I’ve submitted over the last 3 years saying ‘Digital Anarchy’. And still to this day I get: Dugal Accusatorial (seriously), Digital Ariki, and other weird stuff. A.I. is great when it works, but sometimes… it definitely does not work. And people want to put this into self-driving cars? Crazy talk right there.

 If you want to help the A.I. out, you can use the Speech-to-Text Glossary (click the link for a tutorial). This still won’t train the A.I., but if the A.I. is uncertain about a word, it’ll help it select the right one.

How does the glossary work? The A.I. analyzes a word sound and then comes up with possible words for that sound. Each word gets a ‘confidence score’. The one with the highest score is the one you see in your transcript. In the case above, ‘Ariki’ might have had a confidence of .6 (out 0 to 1, so .6 is pretty low) and ‘Anarchy’ might have been .53. So my transcript showed Ariki. But if I’d put Anarchy into the Glossary, then the A.I. would have seen the low confidence score for Ariki and checked if the alternatives matched any glossary terms.

So the Glossary can be very useful with proper names and the like.

But, as mentioned, nothing you do in Transcriptive is training the A.I. The only thing we’re doing with your data is storing it and we’re not even doing that if you tell us not to.

It’s possible that we will add the option in the future to submit training data to help train the A.I. But that’ll be a specific feature and you’ll need to intentionally upload that data.

Dumb A.I., Dumb Anarchist: Using the Transcriptive Glossary

We’ve been working on Transcriptive for like 3 years now. In that time, the A.I. has heard my voice saying ‘Digital Anarchy’ umpteen million times. So, you would think it would easily get that right by now. As the below transcript from our SRT Importing tutorial shows… not so much. (Dugal Accusatorial? Seriously?)

ALSO, you would think that by now I would have a list of terms that I would copy/paste into Transcriptive’s Glossary field every time I get a transcript for a tutorial. The glossary helps the A.I. determine what  ‘vocal sounds’ should be when it translates those sounds into words. Uh, yeah… not so much.

So… don’t be like AnarchyJim. If you have words you know the A.I. probably won’t get: company names, industry jargon, difficult proper names (cool blog post on applying player names to an MLB video here), etc., then use Transcriptive’s glossary (in the Transcribe dialog). It does work. (and somebody should mention that to the guy that designed the product. Oy.)

Use the Glossary field in the Transcribe dialog!Overall the A.I. is really accurate and does usually get ‘Digital Anarchy’ correct. So I get lazy about using the glossary. It is a really useful thing…

A.I. Glossary in Transcriptive

Just Say No to A.I. Chatbots

For all the developments in artificial intelligence, one of the consistently worst uses of it is with chatbots. Those little ‘Chat With Us’ side bars on many websites. Since we’re doing a lot with artificial intelligence (A.I.) in Transcriptive and in other areas, I’ve gotten very familiar with how it works and what the limitations are. It starts to be easy to spot where it’s being used, especially when it’s used badly.

So A.I. chatbots, which really doesn’t work well, have become a bit of a pet peeve of mine. If you’re thinking about using them for your website, you owe it to yourself to  click around the web and see how often ‘chatting’ gets you a usable answer. It’s usually just frustrating. You go a few rounds with a cheery chatbot before getting to what you were going to do in the first place… send a message that will be replied to by a human. Total waste of time and doesn’t answer the questions.

Artificial intelligence isn't great for chatbotsDo you trust cheery, know-nothing chatbots with your customers?

The main problem is that chatbots don’t know when to quit. I get it that some business receive the same question over and over… where are you located? what are your hours? Ok, fine, have a chatbot act as a FAQ. But the chatbot needs to quickly hand off the conversation to a real person if the questions go beyond what you could have in an FAQ. And frankly, an FAQ would be better than trying to fake-out people with your A.I. chatbot. (honesty and authenticity matter, even on the web)

A.I. is just not great at reading comprehension. It can get the jist of things usually, which I think is useful for analytics and business intelligence. But this doesn’t allow it to respond with any degree of accuracy or intelligence. For responding to customer queries it produces answers that are sort of close… but mostly unusable. So, the result is frustrated customers.

Take a recent experience with Audi. I’m looking at buying a new car and am interested in one of their SUVs. I went onto an Audi dealer site to inquire about a used one they had. I wanted to know 1) was it actually in stock and 2) how much of the original warranty was left since it was a 2017? There was a button to send a message which I was originally going to use but decided to try the chat button that was bouncing up and down getting my attention.

So, I asked those questions in the chat. If it had been a real person, they definitely could have answered #1 and probably #2, even if they were just an assistant. But no, I ended in the same place I would’ve been if I’d just clicked ‘send a message’ in the first place. But first, I had to get through a bunch of generic answers that didn’t answer any of my questions and just dragged me around in circles. This is not a good way to deal with customers if you’re trying to sell them a $40,000 car.

And don’t get me started on Amazon’s chatbots. (and emailbots for that matter)

It’s also funny to notice how the chatbots try and make you think it’s human, with misspelled words and faux emotions. I’ve had a chatbot admonish me with ‘I’m a real person…’ when I called it a chatbot. It then followed that with another generic answer that didn’t address my question. The Pinocchio chatbot… You’re not a real boy, not a real person and you don’t get to pass Go and collect $200. (The real salesperson I eventually talked to confirmed it was a chatbot.)

I also had one threaten to end the chat if I didn’t watch my language, which was not aimed at the chatbot. I just said, “I just want this to f’ing work”. A little generic frustration. However, after it told me to watch my language, I went from frustrated to kind of pissed. So much for artificial intelligence having emotional intelligence. Getting faux-insulted over something almost any real human would recognize as low grade frustration, is not going to make customers happier.

I think A.I. has some amazing uses, Transcriptive makes great use of A.I. but it also has a LOT of shortcomings. All of those shortcomings are glaringly apparent when you look at chatbots. There are, of course, many companies trying to create conversational A.I. but so far the results have been pretty poor.

Based on what I’ve seen developing products with A.I., I think it’s likely it’ll be quite a while before conversational A.I. is a good experience on a regular basis. You should think very hard about entrusting your customers to it. A web form or FAQ is going to be better than a frustrating experience with a ‘sales person’.

Not sure what this has to do with video editing. Perhaps just another example of why A.I. is going to have a hard time editing anything that requires comprehending the content. Furthering my belief that A.I. isn’t going to replace most video editors any time soon.