Q: What is Speech AI?

Speech AI, also known as automatic speech recognition (ASR), computer speech recognition, or speech-to-text (STT), is a kind of artificial intelligence which enables AIspace to process human speech into a written format.

Q: Why use Speech AI (Speech Recognition)?

We use Speech AI for two key benefits:

  • Extract speech to text from media files (video and audio)

  • Make media files searchable by using AIspace world’s first unified search engine.

Q: How many languages does Speech AI support?

More than 120 different languages. For full language list, please download FAQ.

Q: How accurate is Speech AI?

In some instances, accuracy can be higher than 90%. However, there are a range of factors (like accent, speech volume, diction, background noise, multiple voices speaking at the same time, etc.) that can affect the accuracy rate.

Q: What media files are suitable for Speech AI?

Ideally, media files should have clear speech without background music, noise, effects, or microphone hiss. Examples of suitable content are news clips, documentaries, recorded meetings, interviews, lectures, presentations, etc. Typically, any video or audio clips that contains clear and audible speech will yield good results.

Q: What media files are not suitable for Speech AI?

The following might not be suitable - movies, TV shows, anything with mixed audio and sound effects, poorly recorded content with background noise (hiss). In addition, there are factors that can affect the accuracy. For example, factors like: 

  • localised accent

  • low speech volume

  • bad diction

  • heavy background noise

  • multiple voices over speaking one another

In cases like the above, the accuracy can be as low as 20%.

Q: Is Speech AI service secured?

Yes! Your privacy is of utmost importance to us. Yes. Once Speech AI finish processing the media files, they are deleted immediately. We do not keep your media files or the data generated or use them for any other purpose. Nobody inside or outside AIspace has access to your files.

Q: How long does it take to complete a Speech AI task?

The time ratio is approximately 1:1. Meaning 1 hour of video or audio will take 1 hour of processing. However, if you are running multiple Speech AI tasks at one time, AIspace queues them in order.

Q: What is the maximum length of video and audio that Speech AI can process?

180 minutes.

Q: What are the Speech AI packages?

You can buy according to your needs. Packages come in 10, 50 or 100 hours.

Q: How am I charged for speech AI?

You account will be charged and deducted in 15 min blocks. For example, if your video is

  • 42 mins long, 3 blocks (of 15 mins) will be deducted

  • 89 mins long, 6 blocks (of 15 mins) will be deducted

Q: What media formats can Speech AI process?

Speech AI can process the following media formats:

  • Video - .wmv, .mp4, .m4a, .mov, .avi

  • Audio - wav, .aiff, .flac, .wma, .mp3, .aac