Meet Gladia, an almost perfect audio-to-text AI converter

Based in France, Gladia is a pioneering startup in the realm of artificial intelligence. Its ambitious goal is to revolutionize how businesses engage with audio data. To achieve this, the team at Gladia is hard at work, crafting an audio transcription application programming interface, or API for short. This isn’t just any ordinary API, though – it is designed to blend seamlessly with various other products.

What sets Gladia’s technology apart is its superior functionality. While other options are available in the market, Gladia strives to offer a solution that surpasses them all in terms of performance and reliability.

But that’s not all. Gladia’s robust tech foundation is not just designed for simple audio transcription. It opens up a whole world of possibilities, sparking fresh and innovative ways to use audio. This includes various new applications that could change how we think about and utilize audio data.

What is Gladia?

The core technology of Gladia is constructed on Whisper, an open-source transcription model by OpenAI. However, Whisper, in its initial form, is far from perfect. It’s known for its slower pace, which doesn’t lend itself well to real-time, efficient applications. Recognizing this flaw, Gladia’s team has dedicated significant time and effort to enhance Whisper. They’ve managed to transform it into a transcription model that is quick and highly responsive.

Yet, speed wasn’t the only challenge. You see, half of Whisper’s foundation is based on GPT-2, another model by OpenAI. While GPT-2 has its merits, it also has certain drawbacks. One such drawback is its tendency to overreact, or over-interpret, under certain circumstances. A classic example of this is how GPT-2 frequently recognizes commonly used phrases from online videos. For instance, it might pick up phrases like, “if you enjoyed this video, please like and subscribe,” even when they are not present.

This is because the model has an underlying mathematical bias, leading to overrepresenting some sentences, such as the one mentioned. But Gladia’s team isn’t letting this stand in their way. They are committed to rectifying these limitations, presenting a faster, more accurate, and more reliable solution.

Using Gladia for YouTube Videos

We tried using Gladia for a YouTube video we picked at random. The chosen video was 14 minutes long, and it talked about a possible car market crash in the next months. Not a surprising video, but what was surprising was the speed of Gladia converting the video to text. It took the application 42 seconds to transcribe that video to text…and it did a very good job. It wasn’t a very complex video, and the voice was a narrator that spoke pretty clear, most of the time. However, it was pretty impressive.

How much does it cost to use Gladia?

Gladia is making a bold claim – it can transcribe an entire hour of audio at a mere cost of $0.61. Moreover, the transcription process is astonishingly fast, taking approximately 60 seconds. But the offerings of Gladia’s API don’t stop there. It comes with advanced features that enable it to identify the presence of multiple speakers and can automatically add timestamps. The API can also recognize different languages and effortlessly switch between them if required.

That’s not all. The Gladia API takes care of the little details as well. It automatically inserts punctuation and adjusts text casing, ensuring the transcriptions are easy to read and understand.

Now, let’s talk about the format. As is the norm with most APIs, Gladia provides the final output in a JSON format. But, they understand that different businesses have different needs. So, for companies that require subtitles, Gladia has included support for SRT and VTT files. This means businesses can easily generate subtitles from their transcriptions, adding another dimension to their content.

Gladia has recently achieved a significant milestone in their journey. They successfully secured $4 million in a seed funding round. New Wave, a leading venture capital firm, spearheaded this round, displaying their confidence in Gladia’s vision and potential. Other prominent investors who have trusted Gladia include Sequoia and Cocoa, alongside notable business angels such as Solomon Hykes, Pierre Betouin, Miroslaw Klaba, and Alexandre Berriche.

But Gladia views this funding not as an end but a means to further their ambitious plans. Developing a robust and reliable transcription API is the first step in their grand vision. The team at Gladia intends to leverage this strong technical groundwork to construct even more impressive features.

Take their plans for language translation, for example. Once an audio file has been transcribed, Gladia’s system can translate the resulting text into a different language. When combined with their feature of word-level timestamps, businesses can upload an audio file and receive subtitles in a multitude of languages in just a few minutes.

And that’s just the beginning. In the future, Gladia hopes to expand its capabilities even further. They plan to develop features that can summarize the contents of an audio file, categorize the content into various topical segments, automatically create chapters, carry out sentiment analysis, and much more. It’s an exciting prospect, showcasing the immense potential of their technological foundation and the vast possibilities that lie ahead.