Baidu Research’s New AI Algorithm Mimics Voice With Very Few Samples

AI typically needs a plethora of data and a lot of time for something like voice cloning. It needs to listen to hours of recordings. However, a new process could get that down to one minute. Baidu researchers have unveiled an upgraded version of Deep Voice, their text-to-speech synthesis system, that can now, once trained, clone any voice after listening to a few snippets of audio. This capability was enabled by learning shared and discriminative information from speakers. Baidu calls this ‘Voice Cloning’. Voice cloning is expected to have significant applications in the direction of personalization in human-machine interfaces.

bAIDU

Here, Baidu focuses on two fundamental approaches (refer above figure):

  1. Speaker AdaptionSpeaker adaptation is based on fine-tuning a multi-speaker generative model with a few cloning samples, by using backpropagation-based optimization. Adaptation can be applied to the whole model or only the low-dimensional speaker embeddings. The latter enables a much lower number of parameters to represent each speaker, albeit it yields a longer cloning time and a lower audio quality.
  2. Speaker EncodingSpeaker encoding is based on training a separate model to directly infer a new speaker embedding from cloning audios that will ultimately be used with a multi-speaker generative model. The speaker encoding model has time-and-frequency-domain processing blocks to retrieve speaker identity information from each audio sample, and attention blocks to combine them in an optimal way.

For detailed information and mathematical explanations, refer the paper by Baidu Research.

However, this technology can also possibly have a downside as this could be tumultuous to people relying upon biometric voice security.

( via MitTechReview, Wiki, BaiduResearch)

How DeepMind’s AI taught itself to walk

DeepMind’s programmers have given the agent a set of virtual sensors (so it can tell whether it’s upright or not, for example) and then incentivize to move forward. The computer works the rest out for itself, using trial and error to come up with different ways of moving. True motor intelligence requires learning how to control and coordinate a flexible body to solve tasks in a range of complex environments. Existing attempts to control physically simulated humanoid bodies come from diverse fields, including computer animation and biomechanics.  A trend has been to use hand-crafted objectives, sometimes with motion capture data, to produce specific behaviors. However, this may require considerable engineering effort and can result in restricted behaviors or behaviors that may be difficult to repurpose for new tasks.

True motor intelligence requires learning how to control and coordinate a flexible body to solve tasks in a range of complex environments. Existing attempts to control physically simulated humanoid bodies come from diverse fields, including computer animation and biomechanics.  A trend has been to use hand-crafted objectives, sometimes with motion capture data, to produce specific behaviors. However, this may require considerable engineering effort and can result in restricted behaviors or behaviors that may be difficult to repurpose for new tasks.

 

DeepMind published 3 papers which are as follows:

Emergence of locomotion behaviors in rich environments:- 

For some AI problems, such as playing Atari or Go, the goal is easy to define – it’s winning. But describing a process such as a jog, a backflip or a jump is difficult because of accurately describing a complex behavior which is a common problem when teaching motor skills to an artificial system. DeepMind explored how sophisticated behaviors can emerge from scratch from the body interacting with the environment using only simple high-level objectives, such as moving forward without falling. DeepMind trained agents with a variety of simulated bodies to make progress across diverse terrains, which require jumping, turning and crouching. The results show that their agents developed these complex skills without receiving specific instructions, an approach that can be applied to train systems for multiple, distinct simulated bodies.

wall

Simulated planar Walker attempts to climb a wall

But how do you describe the process for performing a backflip? Or even just a jump? The difficulty of accurately describing a complex behavior is a common problem when teaching motor skills to an artificial system. In this work, we explore how sophisticated behaviors can emerge from scratch from the body interacting with the environment using only simple high-level objectives, such as moving forward without falling. Specifically, we trained agents with a variety of simulated bodies to make progress across diverse terrains, which require jumping, turning and crouching. The results show our agents develop these complex skills without receiving specific instructions, an approach that can be applied to train our systems for multiple, distinct simulated bodies. The GIFs show how this technique can lead to high-quality movements and perseverance.

Learning human behaviors from motion capture by adversarial imitation:-

walk.gif

A humanoid Walker produces human-like walking behavior

The emergent behavior described above can be very robust, but because the movements must emerge from scratch, they often do not look human-like.

DeepMind in their second paper demonstrated how to train a policy network that imitates motion capture data of human behaviors to pre-learn certain skills, such as walking, getting up from the ground, running, and turning.

Having produced behavior that looks human-like, they can tune and repurpose those behaviors to solve other tasks, like climbing stairs and navigating walled corridors.

 

Robust imitation of diverse behaviors:-

div.gif

The planar Walker on the left demonstrates a particular walking style and the agent in the right panel imitates that style using a single policy network.

The third paper proposes a neural network architecture, building on state-of-the-art generative models, that is capable of learning the relationships between different behaviors and imitating specific actions that it is shown. After training, their system can encode a single observed action and create a new novel movement based on that demonstration.After training, their system can encode a single observed action and create a new novel movement based on that demonstration.It can also switch between different kinds of behaviors despite never having seen transitions between them, for example switching between walking styles.

 

 

(via The Verge, DeepMind Blog)

 

 

Google’s and Nvidia’s AI Chips

Google

Google will soon launch a cloud computing service that provides exclusive access to a new kind of artificial intelligence chip designed by its own engineers. CEO Sundar Pichai revealed the new chip and service this morning in Silicon Valley during his keynote at Google I/O, the company’s annual developer conference.

GoogleChip4.jpg

This new processor is a unique creation designed to both train and execute deep neural networks—machine learning systems behind the rapid evolution of everything from image and speech recognition to automated translation to robotics. Google says it will not sell the chip directly to others. Instead, through its new cloud service, set to arrive sometime before the end of the year, any business or developer can build and operate software via the internet that taps into hundreds and perhaps thousands of these processors, all packed into Google data centers.

According to Dean, Google’s new “TPU device,” which spans four chips, can handle 180 trillion floating point operations per second, or 180 teraflops, and the company uses a new form of computer networking to connect several of these chips together, creating a “TPU pod” that provides about 11,500 teraflops of computing power. In the past, Dean said, the company’s machine translation model took about a day to train on 32 state-of-the-art CPU boards. Now, it can train in about six hours using only a portion of a pod.

Nvidia

Nvidia has released a new state-of-the-art chip that pushes the limits of machine learning, the Tesla P100 GPU. It can perform deep learning neural network tasks 12 times faster than the company’s previous top-end system (The TitanX). The P100 was a huge commitment for Nvidia, costing over $2 billion in research and development, and it sports a whopping 150 billion transistors on a single chip, making the P100 the world’s largest chip, Nvidia claims. In addition to machine learning, the P100 will work for all sorts of high-performance computing tasks — Nvidia just wants you to know it’s really good at machine learning.

dgx.png

To top off the P100’s introduction, Nvidia has packed eight of them into a crazy-powerful $129,000 supercomputer called the DGX-1. This show-horse of a machine comes ready to run, with deep-learning software preinstalled. It’s shipping first to AI researchers at MIT, Stanford, UC Berkeley, and others in June. On stage, Huang called the DGX-1 “one beast of a machine.”

The competition between these upcoming AI chips and Nvidia all points to an emerging need for simply more processing power in deep learning computing. A few years ago, GPUs took off because they cut the training time for a deep learning network from months to days. Deep learning, which had been around since at least the 1950s, suddenly had real potential with GPU power behind it. But as more companies try to integrate deep learning into their products and services, they’re only going to need faster and faster chips.

 

(via Wired, Forbes, Nvidia, The Verge)

 

Neuralink

The rise of A.I. and over-powering humans will prove to be a catastrophic situation. Elon Musk is attempting to combat the rise of A.I. with the launch of his latest venture, brain-computer interface company Neuralink.  Detailed information about Neuralink on Wait But Why. (Highly Recommended to Read!)

neuralink.jpeg

Musk seems to want to achieve a communications leap equivalent in impact to when humans came up with language – this proved an incredibly efficient way to convey thoughts socially at the time, but what Neuralink aims to do is increase that efficiency by multiple factors of magnitude. Person-to-person, Musk’s vision would enable direct “uncompressed” communication of concepts between people, instead of having to effectively “compress” your original thought by translating it into language and then having the other party “decompress” the package you send them linguistically, which is always a lossy process.

Another thing in favor of Musk’s proposal is that symbiosis between brains and computers isn’t fiction. Remember that person who types with brain signals? Or the paralyzed people who move robot arms? These systems work better when the computer completes people’s thoughts. The subject only needs to type “bulls …” and the computer does the rest. Similarly, a robotic arm has its own smarts. It knows how to move; you just have to tell it to. So even partial signals coming out of the brain can be transformed into more complete ones. Musk’s idea is that our brains could integrate with an AI in ways that we wouldn’t even notice: imagine a sort of thought-completion tool.

So it’s not crazy to believe there could be some very interesting brain-computer interfaces in the future. But that future is not as close at hand as Musk would have you believe. One reason is that opening a person’s skull is not a trivial procedure. Another is that technology for safely recording from more than a hundred neurons at once—neural dust, neural lace, optical arrays that thread through your blood vessels—remains mostly at the blueprint stage.

 

( via Wired, TechCrunch )

Artificial Intelligence ready ARM CPUs (DynamIQ)

ARM processors are ubiquitous and many of the tech gadgets we used are powered by them, furthermore, the company is showing off its plans for the future with DynamIQ. Aimed squarely at pushing the artificial intelligence and machine learning systems we’re expecting to see in cars, phones, gaming consoles and everything else, it’s what the company claims is an evolution on the existing “big.Little” technology.

arm.jpg

Here’s a high-level look at some of the new features, capabilities and benefits DynamIQ will bring to new Cortex-A processors later this year:

  • New dedicated processor instructions for ML and AI: Cortex-A processors designed for DynamIQ technology can be optimized to deliver up to a 50x boost in AI performance over the next 3-5 years relative to Cortex-A73-based systems today and up to 10x faster response between CPU and specialized accelerator hardware on the SoC that can unleash substantially better-combined performance.

arm-dynamiq2.jpg

  • Increased multi-core flexibility: SoC designers can scale up to eight cores in a single cluster and each core can have different performance and power characteristics. These advanced capabilities enable faster responsiveness to ML and AI applications. A redesigned memory subsystem enables both faster data access and enhance power management
  • More performance within restricted thermal budgets: Efficient and much faster switching of software tasks to match the right-sized processor for optimal performance and power is further enhanced through independent frequency control of individual processors
  • Safer autonomous systems: DynamIQ brings greater levels of responsiveness for ADAS solution and increased safety capabilities which will enable partners to build ASIL-D compliant systems for safe operation under failure conditions.

 

 

(source: ARM community, Engadget)

Machine Learning Speeds Up

Cloudera and Intel are jointly speeding up Machine Learning, with the help of Intel’s new Math Kernel. Benchmarks demonstrate the combined offering can advance machine learning performance over large data sets in less time and with less hardware.  This helps organizations accelerate their investments in next generation predictive analytics.

Cloudera is the leader in Apache Spark development, training, and services. Apache Spark is advancing the art of machine learning on distributed systems with familiar tools that deliver at impressive scale. By joining forces, Cloudera and Intel are furthering a joint mission of excellence in big data management in the pursuit of better outcomes by making machine learning smarter and easier to implement.

intcloud.jpg

Predictive Maintenance

By combining Spark, Intel MKL libraries, and Intel’s optimized CPU architecture machine learning workloads can scale quickly. As machine learning solutions get access to more data they can provide better accuracy in delivering predictive maintenance, recommendation engines, proactive health care and monitoring, and risk and fraud detection.

“There’s a growing urgency to implement richer machine learning models to explore and solve the most pressing business problems and to impact society in a more meaningful way,” said Amr Awadallah, chief technical officer of Cloudera. “Already among our user base, machine learning is an increasingly common practice. In fact, in a recent adoption survey over 30% of respondents indicated they are leveraging Spark for machine learning.

 

(via – Technative.io)

The Poker Playing AI

As we know that the game of Poker involves dealing with imperfect information, which makes the game very complex, and more like many real-world situations. At the Rivers Casino in Pittsburgh this week, a computer program called Libratus (A latin word meaning balanced), an AI system that may finally prove that computers can do this better than any human card player. Libratus was created by Tuomas Sandholm, a professor in the computer science department at CMU, and his graduate student Noam Brown.

mitpoker_0.jpg

The AI Poker play against the world’s best poker players. Kim is a high-stakes poker player who specializes in no-limit Texas Hold ‘Em. Jason Les and Daniel McAulay, two of the other top poker players challenging the machine, describe its play in much the same way. It does a little bit of everything,” Kim says. It doesn’t always play the same type of hand in the same way. It may bluff with a bad hand or not. It may bet high with a good hand—or not. That means Kim has trouble finding holes in its game. And if he does find a hole, it disappears the next day.

“The bot gets better and better every day. It’s like a tougher version of us,” said Jimmy Chou, one of the four pros battling Libratus. “The first couple of days, we had high hopes. But every time we find a weakness, it learns from us and the weakness disappears the next day.”

Libratus is playing thousands of games of heads-up, or two-player, no-limit Texas hold’em against several expert professional poker players. Now a little more than halfway through the 20-day contest, Libratus is up by almost $800,000 against its human opponents. So a victory, while far from guaranteed, may well be in the cards.

Regardless of the pure ability of the humans and the AI, it seems clear that the pros will be less effective as the tournament goes on. Ten hours of poker a day for 20 days straight against an emotionless computer was exhausting and demoralizing, even for pros like Doug Polk. And while the humans sleep at night, Libratus takes the supercomputer powering its in-game decision making and applies it to refining its overall strategy.

A win for Libratus would be a huge achievement in artificial intelligence. Poker requires reasoning and intelligence that has proven difficult for machines to imitate. It is fundamentally different from checkers, chess, or Go because an opponent’s hand remains hidden from view during play. In games of “imperfect information,” it is enormously complicated to figure out the ideal strategy given every possible approach your opponent may be taking. And no-limit Texas hold’em is especially challenging because an opponent could essentially bet any amount.

“Poker has been one of the hardest games for AI to crack,” says Andrew Ng, chief scientist at Baidu. “There is no single optimal move, but instead an AI player has to randomize its actions so as to make opponents uncertain when it is bluffing.”

(Sources: MitTechReview, The Verge, Wired)

Google’s AI Translation Tool Creates Its Own Secret Language

Google’s Neural Machine Translation system had gone live back in September. It uses deep learning to produce better, more natural translations between languages. The company’s AI team calls it the Google Neural Machine Translation system, or GNMT, and it initially provided a less resource-intensive way to ingest a sentence in one language and produce that same sentence in another language. Instead of digesting each word or phrase as a standalone unit, as prior methods do, GNMT takes in the entire sentence as a whole.

GNMT’s creators were curious about something. If you teach the translation system to translate English to Korean and vice versa, and also English to Japanese and vice versa… could it translate Korean to Japanese, without resorting to English as a bridge between them? They made this helpful gif to illustrate the idea of what they call “zero-shot translation” (it’s the orange one):

translate1.gif

As it turns out — the answer is yes! It produces “reasonable” translations between two languages that it has not explicitly linked in any way. Remember, no English allowed.

But this raised the second question. If the computer is able to make connections between concepts and words that have not been formally linked… does that mean that the computer has formed a concept of shared meaning for those words, meaning at a deeper level than simply that one word or phrase is the equivalent of another?

This can mean that the computer has developed its own internal language to represent concepts it is using to between other languages.

transcape.png

A Visualization of the translation system’s memory when translating a single sentence in multiple directions

A visualization of the translation system’s memory when translating a single sentence in multiple directions.

In some cases, Google says its GNMT system is even approaching human-level translation accuracy. That near-parity is restricted to transitions between related languages, like from English to Spanish and French. However, Google is eager to gather more data for “notoriously difficult” use cases, all of which will help its system learn and improve over time thanks to machine learning techniques. So starting today, Google is using its GNMT system for 100 percent of Chinese to English machine translations in the Google Translate mobile and web apps, accounting for around 18 million translations per day.

Google admits that its approach still has ways to go. “GNMT can still make significant errors that a human translator would never make, like dropping words and mistranslating proper names or rare terms,” Le and Schuster explain, “and translating sentences in isolation rather than considering the context of the paragraph or page. There is still a lot of work we can do to serve our users better.” Over time this will improve and it may be a lot more efficient.

 

Sources: (TechCrunch, The Verge)