Among the billion dollar technology giants with their three-story booths at Mobile World Congress 2018, there was a pair of entrepreneurs wandering around talking about an AI breakthrough that they just launched in a new iOS and Android app called Otter.ai.
When we sat down to talk about it in a tiny meeting room in the back corner of Fira Barcelona’s Hall 2, Sam Liang placed his iPhone on the table and tapped the record button in the Otter app. As the CEO of AISense–the company behind Otter.ai–Liang started explaining how the 15-person startup from Los Altos, CA took a different approach to understand audio data than Amazon Alexa, Google Assistant, and the other companies working on speech recognition.
SEE: How we learned to talk to computers, and how they learned to answer back (TechRepublic) | Download as a PDF
As Liang gave his pitch, Otter started spitting out text–with roughly a 2-3 second delay. And since Liang had set up our meeting in the app beforehand, the software automatically recognized when his teammate Seamus McAteer chimed in with his own comments or I interrupted with follow-up questions.
While Otter’s natural language processing wasn’t perfect by any means–punctuation is missing, words are misunderstood, speakers are sometimes misidentified–it’s remarkably close, especially considering its speed and the fact that the app is free.
“Our technology is quite different,” said Liang, in his interview with ZDNet. “We call it ‘Ambient Voice Intelligence’ and we use the word ambient to indicate that this is working in the background… Your brain can only remember 10-20% of the information [from a meeting]… So we thought we can help people capture that information and then search for it really fast.”
The search is the best feature. Once the recording is finished, the app’s machine learning automatically creates about 10 keywords so that you know what the meeting was about. And you can start searching the full text right away. Also useful is that once you hone in on a keyword, you can hit the play button to listen to the section of the audio where it occurred.
The next best feature of the app is that you can share recorded meetings. So, if you have a meeting and a colleague can’t attend, you can send them the transcript and audio afterward, so that they can find the stuff that’s relevant to them.
All of the those advanced features are easiest if you connect your Google account to the app and import your contacts–so it works especially smoothly if your organization uses Google Apps. The Google integration isn’t surprising since Liang is a former Google engineer.
McAteer has been working in mobile and data analytics for over 20 years. The rest of the team is made up of former Google, Facebook, Yahoo, and Nuance employees, as well as Ph.Ds and computer scientists from MIT, Stanford, and other top tech programs.
The team has been working on the technology behind Otter since January 2016. They have an API that they have licensed to other partners during the past year–primarily to offer transcription of audio files after they’ve finished recording. AISense used all of that partner data to tune and train their algorithms.
In January, they announced a licensing partnership with Zoom, the fast-growing video conferencing service, which now offers an option to transcribe video meetings after they’ve been recorded–powered by AISense.
With the launch of its own free app featuring real-time transcription, the company is moving to the next stage. It eventually plans to launch a premium version of its app, which will build on the functionality of the free version. For example, the free version will allow you to search meetings for the past 90 days. The premium version will extend that.
“The ability to remember, search, and share your voice conversations is the next frontier in collaboration,” said Liang. “Otter empowers the user to use AI for everyday conversations, so they can focus on what is being said and forget about taking notes.”
You can find the app at Otter.ai, the Apple App Store, and the Google Play Store. The iOS version is a little more polished at this point, but both are worth a try–and it’s worth watching how this app develops and improves over time.
IBM Watson offers real-time text-to-speech services, but it uses a supercomputer to power it. So it’s impressive what AISense pulled off with an app and a smartphone, and it was arguably one of the most important breakthroughs announced at Mobile World Congress 2018–despite flying under the radar.