How DeepMind’s AI taught itself to walk

DeepMind’s programmers have given the agent a set of virtual sensors (so it can tell whether it’s upright or not, for example) and then incentivize to move forward. The computer works the rest out for itself, using trial and error to come up with different ways of moving. True motor intelligence requires learning how to control and coordinate a flexible body to solve tasks in a range of complex environments. Existing attempts to control physically simulated humanoid bodies come from diverse fields, including computer animation and biomechanics.  A trend has been to use hand-crafted objectives, sometimes with motion capture data, to produce specific behaviors. However, this may require considerable engineering effort and can result in restricted behaviors or behaviors that may be difficult to repurpose for new tasks.

True motor intelligence requires learning how to control and coordinate a flexible body to solve tasks in a range of complex environments. Existing attempts to control physically simulated humanoid bodies come from diverse fields, including computer animation and biomechanics.  A trend has been to use hand-crafted objectives, sometimes with motion capture data, to produce specific behaviors. However, this may require considerable engineering effort and can result in restricted behaviors or behaviors that may be difficult to repurpose for new tasks.

 

DeepMind published 3 papers which are as follows:

Emergence of locomotion behaviors in rich environments:- 

For some AI problems, such as playing Atari or Go, the goal is easy to define – it’s winning. But describing a process such as a jog, a backflip or a jump is difficult because of accurately describing a complex behavior which is a common problem when teaching motor skills to an artificial system. DeepMind explored how sophisticated behaviors can emerge from scratch from the body interacting with the environment using only simple high-level objectives, such as moving forward without falling. DeepMind trained agents with a variety of simulated bodies to make progress across diverse terrains, which require jumping, turning and crouching. The results show that their agents developed these complex skills without receiving specific instructions, an approach that can be applied to train systems for multiple, distinct simulated bodies.

wall

Simulated planar Walker attempts to climb a wall

But how do you describe the process for performing a backflip? Or even just a jump? The difficulty of accurately describing a complex behavior is a common problem when teaching motor skills to an artificial system. In this work, we explore how sophisticated behaviors can emerge from scratch from the body interacting with the environment using only simple high-level objectives, such as moving forward without falling. Specifically, we trained agents with a variety of simulated bodies to make progress across diverse terrains, which require jumping, turning and crouching. The results show our agents develop these complex skills without receiving specific instructions, an approach that can be applied to train our systems for multiple, distinct simulated bodies. The GIFs show how this technique can lead to high-quality movements and perseverance.

Learning human behaviors from motion capture by adversarial imitation:-

walk.gif

A humanoid Walker produces human-like walking behavior

The emergent behavior described above can be very robust, but because the movements must emerge from scratch, they often do not look human-like.

DeepMind in their second paper demonstrated how to train a policy network that imitates motion capture data of human behaviors to pre-learn certain skills, such as walking, getting up from the ground, running, and turning.

Having produced behavior that looks human-like, they can tune and repurpose those behaviors to solve other tasks, like climbing stairs and navigating walled corridors.

 

Robust imitation of diverse behaviors:-

div.gif

The planar Walker on the left demonstrates a particular walking style and the agent in the right panel imitates that style using a single policy network.

The third paper proposes a neural network architecture, building on state-of-the-art generative models, that is capable of learning the relationships between different behaviors and imitating specific actions that it is shown. After training, their system can encode a single observed action and create a new novel movement based on that demonstration.After training, their system can encode a single observed action and create a new novel movement based on that demonstration.It can also switch between different kinds of behaviors despite never having seen transitions between them, for example switching between walking styles.

 

 

(via The Verge, DeepMind Blog)

 

 

Google’s and Nvidia’s AI Chips

Google

Google will soon launch a cloud computing service that provides exclusive access to a new kind of artificial intelligence chip designed by its own engineers. CEO Sundar Pichai revealed the new chip and service this morning in Silicon Valley during his keynote at Google I/O, the company’s annual developer conference.

GoogleChip4.jpg

This new processor is a unique creation designed to both train and execute deep neural networks—machine learning systems behind the rapid evolution of everything from image and speech recognition to automated translation to robotics. Google says it will not sell the chip directly to others. Instead, through its new cloud service, set to arrive sometime before the end of the year, any business or developer can build and operate software via the internet that taps into hundreds and perhaps thousands of these processors, all packed into Google data centers.

According to Dean, Google’s new “TPU device,” which spans four chips, can handle 180 trillion floating point operations per second, or 180 teraflops, and the company uses a new form of computer networking to connect several of these chips together, creating a “TPU pod” that provides about 11,500 teraflops of computing power. In the past, Dean said, the company’s machine translation model took about a day to train on 32 state-of-the-art CPU boards. Now, it can train in about six hours using only a portion of a pod.

Nvidia

Nvidia has released a new state-of-the-art chip that pushes the limits of machine learning, the Tesla P100 GPU. It can perform deep learning neural network tasks 12 times faster than the company’s previous top-end system (The TitanX). The P100 was a huge commitment for Nvidia, costing over $2 billion in research and development, and it sports a whopping 150 billion transistors on a single chip, making the P100 the world’s largest chip, Nvidia claims. In addition to machine learning, the P100 will work for all sorts of high-performance computing tasks — Nvidia just wants you to know it’s really good at machine learning.

dgx.png

To top off the P100’s introduction, Nvidia has packed eight of them into a crazy-powerful $129,000 supercomputer called the DGX-1. This show-horse of a machine comes ready to run, with deep-learning software preinstalled. It’s shipping first to AI researchers at MIT, Stanford, UC Berkeley, and others in June. On stage, Huang called the DGX-1 “one beast of a machine.”

The competition between these upcoming AI chips and Nvidia all points to an emerging need for simply more processing power in deep learning computing. A few years ago, GPUs took off because they cut the training time for a deep learning network from months to days. Deep learning, which had been around since at least the 1950s, suddenly had real potential with GPU power behind it. But as more companies try to integrate deep learning into their products and services, they’re only going to need faster and faster chips.

 

(via Wired, Forbes, Nvidia, The Verge)

 

Google’s AI Translation Tool Creates Its Own Secret Language

Google’s Neural Machine Translation system had gone live back in September. It uses deep learning to produce better, more natural translations between languages. The company’s AI team calls it the Google Neural Machine Translation system, or GNMT, and it initially provided a less resource-intensive way to ingest a sentence in one language and produce that same sentence in another language. Instead of digesting each word or phrase as a standalone unit, as prior methods do, GNMT takes in the entire sentence as a whole.

GNMT’s creators were curious about something. If you teach the translation system to translate English to Korean and vice versa, and also English to Japanese and vice versa… could it translate Korean to Japanese, without resorting to English as a bridge between them? They made this helpful gif to illustrate the idea of what they call “zero-shot translation” (it’s the orange one):

translate1.gif

As it turns out — the answer is yes! It produces “reasonable” translations between two languages that it has not explicitly linked in any way. Remember, no English allowed.

But this raised the second question. If the computer is able to make connections between concepts and words that have not been formally linked… does that mean that the computer has formed a concept of shared meaning for those words, meaning at a deeper level than simply that one word or phrase is the equivalent of another?

This can mean that the computer has developed its own internal language to represent concepts it is using to between other languages.

transcape.png

A Visualization of the translation system’s memory when translating a single sentence in multiple directions

A visualization of the translation system’s memory when translating a single sentence in multiple directions.

In some cases, Google says its GNMT system is even approaching human-level translation accuracy. That near-parity is restricted to transitions between related languages, like from English to Spanish and French. However, Google is eager to gather more data for “notoriously difficult” use cases, all of which will help its system learn and improve over time thanks to machine learning techniques. So starting today, Google is using its GNMT system for 100 percent of Chinese to English machine translations in the Google Translate mobile and web apps, accounting for around 18 million translations per day.

Google admits that its approach still has ways to go. “GNMT can still make significant errors that a human translator would never make, like dropping words and mistranslating proper names or rare terms,” Le and Schuster explain, “and translating sentences in isolation rather than considering the context of the paragraph or page. There is still a lot of work we can do to serve our users better.” Over time this will improve and it may be a lot more efficient.

 

Sources: (TechCrunch, The Verge)

Google Pixel and Pixel XL

Google Recently announced it’s two new smartphones the Pixel and the Pixel XL which isn’t made by any phone manufacturer, but Google itself has made the new line of phones. The Pixels have replaced the Nexus lines. The Google Pixel will come with Android 7.1 Nougat out of the box.

pixel.png

 

 

Not only it’s the first phone with the Google Assistant built-in but also it has the highest rated smartphone camera ever tested on DxOMark having a mobile score of 89. It is particularly strong in providing a very high level of detail from its 12.3MP camera, with relatively low levels of noise for every tested lighting condition. It also provides accurate exposures with very good contrast and white balance, as well as fast autofocus.

pixel-camera-google-2016-840x473

 

The Pixel’s strong scores under a wide range of conditions make it an excellent choice for almost any kind of photography. As with any small-sensor device, results are excellent in conditions with good and uniform lighting. But in addition, images captured indoors and in low light are very good and provide a level of detail unexpected from a smartphone camera.

It also has a headphone jack !

Pixel Tech Specs:

  • Displaypixelblack.png
    -5.0 inches
    -FHD AMOLED at 441ppi
    -2.5D Corning® Gorilla® Glass 4
  • Battery
    2,770 mAh battery
    Standby time (LTE): up to 19 days
    Talk time (3g/WCDMA): up to 26 hours
    Internet use time (Wi-Fi): up to 13 hours
    Internet use time (LTE): up to 13 hours
    Video playback: up to 13 hours
    Audio playback (via headset): up to 110 hours
    Fast charging: up to 7 hours of use from only 15 minutes of charging

Pixel XL Tech Specs:marlin-silver-en_IN.png

  • Display:
    -5.5 inches
    -QHD AMOLED at 534ppi

    -2.5D Corning® Gorilla® Glass 4

  • Battery:
    3,450 mAh battery
    Standby time (LTE): up to 23 days
    Talk time (3g/WCDMA): up to 32 hours
    Internet use time (Wi-Fi): up to 14 hours
    Internet use time (LTE): up to 14 hours
    Video playback: up to 14 hours
    Audio playback (via headset): up to 130 hours
    Fast charging: up to 7 hours of use from only 15 minutes of charging

Both the devices have :

  • 4GB LPDDR4 RAM
  • 32 or 128GB
  • Qualcomm® Snapdragon™ 821 (MSM8996 pro)
    Quad Core 2x 2.15GHz / 2x 1.6GHz
    Main Camera –
    12.3MP
    Large 1.55μm pixels
    Phase detection autofocus + laser detection autofocus
    f/2.0 Aperture
  • Front Camera –
    8MP
    1.4µm pixels
    f/2.4 Aperture
    Fixed focus
  • Video –
    1080p @ 30fps, 60fps, 120fps
    720p @ 30fps, 60fps, 240fps
    4K @ 30fps
  • Charging –
    USB Type-C™ 18W adaptor with USB-PD
    18W charging
  • OS –
     Android 7.1 Nougat
    Two years of OS upgrade from launch
    Three years of security updates from launch