Technology now successfully landed on the daily lives of our world, from the way think, act and even the simplest thing we do, there is technology.
In 2016, artificial intelligence (AI) research company named DeepMind disclosed details on WaveNet, a deep neural network used to synthesize realistic human speech.
And now, an improved version of the technology is being rolled out for use with Google Assistant.
A system for speech synthesis — otherwise known as text-to-speech (TTS) — typically utilizes one of two techniques.
Concatenative TTS involves the piecing together of chunks of recordings from a voice actor. The drawback of this method is that audio libraries must be replaced whenever upgrades or changes are made.
The other technique, parametric TTS, utilizes a set of parameters to produce computer-generated speech, but this speech can sometimes sound unnatural and robotic.
On the other hand WaveNet, produces waveforms from scratch based on a system developed using a conventional neural network.
According to the development of the system, they begin with the large number of speech samples were used to train the platform to synthesize voices, taking into account which wave forms sounded realistic and which did not.
It enables the speech synthesizer the ability to produce natural intonation, even including details like lip smacks. Depending on the samples fed into the system, it would develop a unique “accent,” which means it could be used to create any number of distinct voices if fed different data sets.
Noting on the system, the biggest limitation of WaveNet was the fact that it initially required a significant amount of computing power and wasn’t very fast, needing one second to generate .02 seconds of audio.
After improving upon the system for the past 12 months, DeepMind’s engineers have optimized WaveNet to the point that it can now produce a raw waveform lasting one second in just 50 milliseconds — 1,000 times faster than the original.
Furthermore, the resolution of each sample has been increased from 8 bits to 16 bits, contributing to its higher scores in tests with human listeners.
These improvements mean the system can now be integrated into consumer products, like Google Assistant.