Google Duo uses new codec for better call quality due to poor connections

While U.S. businesses are marketing their new 5G networks, the reality is that the vast majority of people will not experience the advertised speeds. There are still many parts of the US – and around the world – where data speeds are slow, to compensate, use services like Google Duo compression techniques to deliver the best possible video and audio experience effectively. Google is now testing a new audio codec that wants to significantly improve the audio quality on weak network connections.

In a blog post, the Google AI team presents its new high-quality speech codec, very low bitrate, which they called ‘Lyra’. Like traditional parametric codecs, Lyra’s basic architecture involves the removal of distinctive speech features (also known as ‘functions’) in the form of logmel spectrograms, which are then compressed, transmitted over the network and, on the other hand, recreated using a generative model. Unlike more traditional parametric codecs, however, Lyra uses a new high-quality sound-generating model that can not only extract critical parameters from speech, but also reconstruct speech with minimal amounts of data. The new generative model used in Lyra builds on Google’s previous work on WaveNetEQ, the generative model-based packet loss hiding system currently used in Google Duo.

Lyra’s basic architecture. Source: Google

Google says its approach to Lyra is on a par with the modern waveform code used in many streaming and communications platforms today. The advantage of Lyra over this modern waveform code, according to Google, is that Lyra does not send the sample-by-sample sample, which requires a higher bit rate (and therefore more data). To overcome the difficulty in calculating the complexity of running a generative model on the device, Google says that Lyra uses a “cheaper repetitive generative model” that works “at a slower rate” but generates multiple signals in different frequency ranges parallel which are later combined a single output signal at the desired sample rate. Running this generative model on a mid-range device in real-time provides a 90-ms processing time, which Google says is “similar to other traditional voice codecs.”

Coupled with the AV1 video codec, Google says that video calling can take place even for users on an old 56 kbps dial-up modem. This is because Lyra is designed for environments such as heavy bandwidth, such as 3 kbps. According to Google, Lyra easily outperforms the royalty-free Open-source Opus codec as well as other codecs like Speex, MELP and AMR at very low bit rates. Here are some speech examples provided by Google. Except for audio encoded in Lyra, each of the speech samples suffers from attenuated audio quality at very low bit rates.

Clean speech

Originally

[email protected]

Noisy environment

Originally

[email protected]

Google says it has trained Lyra “with thousands of hours of audio with speakers in more than 70 languages using open source audio libraries and then verifying the audio quality with experts and people with a large audience.” As such, the new codec is already starting in Google Duo to improve the quality of calls on very low bandwidth connections. While Lyra is currently targeting speech usage cases, Google is investigating how to make it into a common audio codec.

Source

Share this:

Related