Baidu Research’s New AI Algorithm Mimics Voice With Very Few Samples

AI typically needs a plethora of data and a lot of time for something like voice cloning. It needs to listen to hours of recordings. However, a new process could get that down to one minute. Baidu researchers have unveiled an upgraded version of Deep Voice, their text-to-speech synthesis system, that can now, once trained, clone any voice after listening to a few snippets of audio. This capability was enabled by learning shared and discriminative information from speakers. Baidu calls this ‘Voice Cloning’. Voice cloning is expected to have significant applications in the direction of personalization in human-machine interfaces.

bAIDU

Here, Baidu focuses on two fundamental approaches (refer above figure):

  1. Speaker AdaptionSpeaker adaptation is based on fine-tuning a multi-speaker generative model with a few cloning samples, by using backpropagation-based optimization. Adaptation can be applied to the whole model or only the low-dimensional speaker embeddings. The latter enables a much lower number of parameters to represent each speaker, albeit it yields a longer cloning time and a lower audio quality.
  2. Speaker EncodingSpeaker encoding is based on training a separate model to directly infer a new speaker embedding from cloning audios that will ultimately be used with a multi-speaker generative model. The speaker encoding model has time-and-frequency-domain processing blocks to retrieve speaker identity information from each audio sample, and attention blocks to combine them in an optimal way.

For detailed information and mathematical explanations, refer the paper by Baidu Research.

However, this technology can also possibly have a downside as this could be tumultuous to people relying upon biometric voice security.

( via MitTechReview, Wiki, BaiduResearch)

How DeepMind’s AI taught itself to walk

DeepMind’s programmers have given the agent a set of virtual sensors (so it can tell whether it’s upright or not, for example) and then incentivize to move forward. The computer works the rest out for itself, using trial and error to come up with different ways of moving. True motor intelligence requires learning how to control and coordinate a flexible body to solve tasks in a range of complex environments. Existing attempts to control physically simulated humanoid bodies come from diverse fields, including computer animation and biomechanics.  A trend has been to use hand-crafted objectives, sometimes with motion capture data, to produce specific behaviors. However, this may require considerable engineering effort and can result in restricted behaviors or behaviors that may be difficult to repurpose for new tasks.

True motor intelligence requires learning how to control and coordinate a flexible body to solve tasks in a range of complex environments. Existing attempts to control physically simulated humanoid bodies come from diverse fields, including computer animation and biomechanics.  A trend has been to use hand-crafted objectives, sometimes with motion capture data, to produce specific behaviors. However, this may require considerable engineering effort and can result in restricted behaviors or behaviors that may be difficult to repurpose for new tasks.

 

DeepMind published 3 papers which are as follows:

Emergence of locomotion behaviors in rich environments:- 

For some AI problems, such as playing Atari or Go, the goal is easy to define – it’s winning. But describing a process such as a jog, a backflip or a jump is difficult because of accurately describing a complex behavior which is a common problem when teaching motor skills to an artificial system. DeepMind explored how sophisticated behaviors can emerge from scratch from the body interacting with the environment using only simple high-level objectives, such as moving forward without falling. DeepMind trained agents with a variety of simulated bodies to make progress across diverse terrains, which require jumping, turning and crouching. The results show that their agents developed these complex skills without receiving specific instructions, an approach that can be applied to train systems for multiple, distinct simulated bodies.

wall

Simulated planar Walker attempts to climb a wall

But how do you describe the process for performing a backflip? Or even just a jump? The difficulty of accurately describing a complex behavior is a common problem when teaching motor skills to an artificial system. In this work, we explore how sophisticated behaviors can emerge from scratch from the body interacting with the environment using only simple high-level objectives, such as moving forward without falling. Specifically, we trained agents with a variety of simulated bodies to make progress across diverse terrains, which require jumping, turning and crouching. The results show our agents develop these complex skills without receiving specific instructions, an approach that can be applied to train our systems for multiple, distinct simulated bodies. The GIFs show how this technique can lead to high-quality movements and perseverance.

Learning human behaviors from motion capture by adversarial imitation:-

walk.gif

A humanoid Walker produces human-like walking behavior

The emergent behavior described above can be very robust, but because the movements must emerge from scratch, they often do not look human-like.

DeepMind in their second paper demonstrated how to train a policy network that imitates motion capture data of human behaviors to pre-learn certain skills, such as walking, getting up from the ground, running, and turning.

Having produced behavior that looks human-like, they can tune and repurpose those behaviors to solve other tasks, like climbing stairs and navigating walled corridors.

 

Robust imitation of diverse behaviors:-

div.gif

The planar Walker on the left demonstrates a particular walking style and the agent in the right panel imitates that style using a single policy network.

The third paper proposes a neural network architecture, building on state-of-the-art generative models, that is capable of learning the relationships between different behaviors and imitating specific actions that it is shown. After training, their system can encode a single observed action and create a new novel movement based on that demonstration.After training, their system can encode a single observed action and create a new novel movement based on that demonstration.It can also switch between different kinds of behaviors despite never having seen transitions between them, for example switching between walking styles.

 

 

(via The Verge, DeepMind Blog)

 

 

Google’s and Nvidia’s AI Chips

Google

Google will soon launch a cloud computing service that provides exclusive access to a new kind of artificial intelligence chip designed by its own engineers. CEO Sundar Pichai revealed the new chip and service this morning in Silicon Valley during his keynote at Google I/O, the company’s annual developer conference.

GoogleChip4.jpg

This new processor is a unique creation designed to both train and execute deep neural networks—machine learning systems behind the rapid evolution of everything from image and speech recognition to automated translation to robotics. Google says it will not sell the chip directly to others. Instead, through its new cloud service, set to arrive sometime before the end of the year, any business or developer can build and operate software via the internet that taps into hundreds and perhaps thousands of these processors, all packed into Google data centers.

According to Dean, Google’s new “TPU device,” which spans four chips, can handle 180 trillion floating point operations per second, or 180 teraflops, and the company uses a new form of computer networking to connect several of these chips together, creating a “TPU pod” that provides about 11,500 teraflops of computing power. In the past, Dean said, the company’s machine translation model took about a day to train on 32 state-of-the-art CPU boards. Now, it can train in about six hours using only a portion of a pod.

Nvidia

Nvidia has released a new state-of-the-art chip that pushes the limits of machine learning, the Tesla P100 GPU. It can perform deep learning neural network tasks 12 times faster than the company’s previous top-end system (The TitanX). The P100 was a huge commitment for Nvidia, costing over $2 billion in research and development, and it sports a whopping 150 billion transistors on a single chip, making the P100 the world’s largest chip, Nvidia claims. In addition to machine learning, the P100 will work for all sorts of high-performance computing tasks — Nvidia just wants you to know it’s really good at machine learning.

dgx.png

To top off the P100’s introduction, Nvidia has packed eight of them into a crazy-powerful $129,000 supercomputer called the DGX-1. This show-horse of a machine comes ready to run, with deep-learning software preinstalled. It’s shipping first to AI researchers at MIT, Stanford, UC Berkeley, and others in June. On stage, Huang called the DGX-1 “one beast of a machine.”

The competition between these upcoming AI chips and Nvidia all points to an emerging need for simply more processing power in deep learning computing. A few years ago, GPUs took off because they cut the training time for a deep learning network from months to days. Deep learning, which had been around since at least the 1950s, suddenly had real potential with GPU power behind it. But as more companies try to integrate deep learning into their products and services, they’re only going to need faster and faster chips.

 

(via Wired, Forbes, Nvidia, The Verge)

 

Artificial Intelligence ready ARM CPUs (DynamIQ)

ARM processors are ubiquitous and many of the tech gadgets we used are powered by them, furthermore, the company is showing off its plans for the future with DynamIQ. Aimed squarely at pushing the artificial intelligence and machine learning systems we’re expecting to see in cars, phones, gaming consoles and everything else, it’s what the company claims is an evolution on the existing “big.Little” technology.

arm.jpg

Here’s a high-level look at some of the new features, capabilities and benefits DynamIQ will bring to new Cortex-A processors later this year:

  • New dedicated processor instructions for ML and AI: Cortex-A processors designed for DynamIQ technology can be optimized to deliver up to a 50x boost in AI performance over the next 3-5 years relative to Cortex-A73-based systems today and up to 10x faster response between CPU and specialized accelerator hardware on the SoC that can unleash substantially better-combined performance.

arm-dynamiq2.jpg

  • Increased multi-core flexibility: SoC designers can scale up to eight cores in a single cluster and each core can have different performance and power characteristics. These advanced capabilities enable faster responsiveness to ML and AI applications. A redesigned memory subsystem enables both faster data access and enhance power management
  • More performance within restricted thermal budgets: Efficient and much faster switching of software tasks to match the right-sized processor for optimal performance and power is further enhanced through independent frequency control of individual processors
  • Safer autonomous systems: DynamIQ brings greater levels of responsiveness for ADAS solution and increased safety capabilities which will enable partners to build ASIL-D compliant systems for safe operation under failure conditions.

 

 

(source: ARM community, Engadget)

Machine Learning Speeds Up

Cloudera and Intel are jointly speeding up Machine Learning, with the help of Intel’s new Math Kernel. Benchmarks demonstrate the combined offering can advance machine learning performance over large data sets in less time and with less hardware.  This helps organizations accelerate their investments in next generation predictive analytics.

Cloudera is the leader in Apache Spark development, training, and services. Apache Spark is advancing the art of machine learning on distributed systems with familiar tools that deliver at impressive scale. By joining forces, Cloudera and Intel are furthering a joint mission of excellence in big data management in the pursuit of better outcomes by making machine learning smarter and easier to implement.

intcloud.jpg

Predictive Maintenance

By combining Spark, Intel MKL libraries, and Intel’s optimized CPU architecture machine learning workloads can scale quickly. As machine learning solutions get access to more data they can provide better accuracy in delivering predictive maintenance, recommendation engines, proactive health care and monitoring, and risk and fraud detection.

“There’s a growing urgency to implement richer machine learning models to explore and solve the most pressing business problems and to impact society in a more meaningful way,” said Amr Awadallah, chief technical officer of Cloudera. “Already among our user base, machine learning is an increasingly common practice. In fact, in a recent adoption survey over 30% of respondents indicated they are leveraging Spark for machine learning.

 

(via – Technative.io)