In the customer service industry, your accent dictates many aspects of your job. It shouldn’t be the case that there’s a “better” or “worse” accent, but in today’s global economy (though who knows about tomorrow’s) it’s valuable to sound American or British. While many undergo accent neutralization training, Sanas is a startup with another approach (and a $5.5 million seed round): using speech recognition and synthesis to change the speaker’s accent in near real time.
The company has trained a machine learning algorithm to quickly and locally (that is, without using the cloud) recognize a person’s speech on one end and, on the other, output the same words with an accent chosen from a list or automatically detected from the other person’s speech.
It slots right into the OS’s sound stack so it works out of the box with pretty much any audio or video calling tool. Right now the company is operating a pilot program with thousands of people in locations from the U.S. and U.K. to the Philippines, India, Latin America and others. Accents supported will include American, Spanish, British, Indian, Filipino and Australian by the end of the year.
To tell the truth, the idea of Sanas kind of bothered me at first. It felt like a concession to bigoted people who consider their accent superior and think others below them. Tech will fix it … by accommodating the bigots. Great!
But while I still have a little bit of that feeling, I can see there’s more to it than this. Fundamentally speaking, it is easier to understand someone when they speak in an accent similar to your own. But customer service and tech support is a huge industry and one primarily performed by people outside the countries where the customers are. This basic disconnect can be remedied in a way that puts the onus of responsibility on the entry-level worker, or one that puts it on technology. Either way the difficulty of making oneself understood remains and must be addressed — an automated system just lets it be done more easily and allows more people to do their job.
It’s not magic — as you can tell in this clip, the character and cadence of the person’s voice is only partly retained and the result is considerably more artificial sounding:
But the technology is improving and like any speech engine, the more it’s used, the better it gets. And for someone not used to the original speaker’s accent, the American-accented version may very well be more easily understood. For the person in the support role, this likely means better outcomes for their calls — everyone wins. Sanas told me that the pilots are just starting so there are no numbers available from this deployment yet, but testing has suggested a considerable reduction of error rates and increase in call efficiency.
It’s good enough at any rate to attract a $5.5 million seed round, with participation from Human Capital, General Catalyst, Quiet Capital and DN Capital.
“Sanas is striving to make communication easy and free from friction, so people can speak confidently and understand each other, wherever they are and whoever they are trying to communicate with,” CEO Maxim Serebryakov said in the press release announcing the funding. It’s hard to disagree with that mission.
While the cultural and ethical questions of accents and power differentials are unlikely to ever go away, Sanas is trying something new that may be a powerful tool for the many people who must communicate professionally and find their speech patterns are an obstacle to that. It’s an approach worth exploring and discussing even if in a perfect world we would simply understand one another better.