OpenAI’s Virtual Wrestling Bots

OpenAI, a firm backed by Elon Musk, has currently revealed one of it’s latest developments in the fields of Machine Learning, demonstrated using the technology of virtual sumo wrestlers.

OpenAI_wrestling

These are the bots inside the virtual world of RoboSumo controlled my machine learning. They (The Bots) taught themselves through trial and error using Reinforcement Learning, a technique inspired by the way animals learn through feedback. It has proved useful for training computers to play games and to control robots. The virtual wrestlers might look slightly ridiculous, but they are using a very clever approach to learning in a fast-changing environment while dealing with an opponent. This game and it’s virtual world were created at OpenAI to show how forcing AI systems to compete can spur them to become more intelligent.

However, one of the disadvantages of reinforcement learning is that doesn’t work well in realistic situations, or where things are more dynamic. OpenAI devised a solution to this problem by creating its own reinforcement algorithm called proximal policy optimization (PPO), which is especially well suited to changing environments.

The latest work, done in collaboration with researchers from Carnegie Mellon University and UC Berkeley, demonstrates a way for AI agents to apply what the researchers call a “meta-learning” framework. This means the agents can take what they have already learned and apply it to a new situation.

Inside the RoboSumo environment (see video above), the agents started out behaving randomly. Through thousands of iterations of trial and error, they gradually developed the ability to move—and, eventually, to fight. Through further iterations, the wrestlers developed the ability to avoid each other, and even to question their own actions. This learning happened on the fly, with the agents adapting even they wrestled each other.

Flexible learning is a very important part of human intelligence, and it will be crucial if machines are going to become capable of performing anything other than very narrow tasks in the real world. This kind of learning is very difficult to implement in machines, and the latest work is a small but significant step in that direction.

 

(sources: MitTechReview, OpenAI Blog, Wired)

Advertisements

NVIDIA’S CHIPS FOR COMPLETE CONTROL OF DRIVERLESS CARS

The race for autonomy in cars is ubiquitous. Top car brands are working in providing complete autonomous vehicles to their customers and the future with self-driving vehicles is inevitable. Adding the cherry to the cake, Nvidia’s recently announced chip is the latest generation of its DrivePX onboard car computers called Pegasus. The device is 13 times faster than the previous iteration, which has so far been used by the likes of Audi, Tesla, and Volvo to provide semi-autonomous driving capabilities in their vehicles.

nvidia_pegasus

Nvidia Pegasus

At the heart of this semiconductor is the mind-boggling technology of Deep Learning. “In the old world, the more powerful your engine, the smoother your ride will be,” Huang said during the announcement. “In the future, the more computational performance you have, the smoother your ride will be.”

Nvidia asserts that the device is only about the size of a license plate. But it has enough power to process data from up to 16 sensors, detect objects, find the car’s place in the world, plan a path, and control the vehicles itself. Oh, and it will also update centrally stored high-definition maps at the same time—all with some resources to spare.

The new system is designed to eventually handle up to 320 trillion operations per second (TOPS), compared to the 24tn of today’s technology. That would give it the power needed to process the masses of data produced by a vehicle’s cameras and other sensors and allow it to drive completely autonomously, Nvidia said. The first systems to be tested next year will have less processing power but will be designed to scale up with the addition of extra chips.

 

(sources: MitTechReview, NvidiaBlog)

Origami-inspired Robots

In a bid to augment the robots’ abilities, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have come up with a new tool: origami exoskeletons.

In a paper published recently, researchers describe four exoskeletons, each made out of a plastic sheet that folds into a predefined shape when heated for a few seconds. There’s a boat-shaped exoskeleton and a glider: one for “walking,” and another that folds up into a crude wheel for faster movement. Each exoskeleton can be donned in turn by a tiny lead bot called Primer. This isn’t a robot as we usually think of them, but a small magnetic cube that can be controlled remotely using magnetic fields.

“If we want robots to help us do things, it’s not very efficient to have a different one for each task,” said CSAIL’s Daniela Rus, the project’s lead, in a press release. “With this metamorphosis-inspired approach, we can extend the capabilities of a single robot by giving it different ‘accessories’ to use in different situations.” In the future, the researchers imagine this sort of approach to robot design could help up make multifunctional bots that can perform complex tasks remotely. They could be used for deep-sea mining operations, for example, or for building colonies in space. These are locations where you don’t want to waste resources shipping out lots of different bots for different jobs, so it’s more efficient to send one with a set of origami tools. As Rus says: “Why update a whole robot when you can just update one part of it?”

Watch the video below for getting a better idea of the origami-inspired robots.

 

(Source : CSAIL, TheVerge, ScienceDaily )

AI recreates game engine using less than two minutes of videogame footage

Georgia Institute of Technology researchers have developed a new approach using an artificial intelligence to learn a complete game engine, the basic software of a game that governs everything from character movement to rendering graphics.

In layman’s terms, the new system can replicate the ‘game engine,’ which dictates everything from character movement to rendering graphics, creating a cloned version that is indistinguishable from the original when played.

The Georgia Tech team’s AI can learn how a video game operates just by watching two minutes of gameplay. On right, the AI replicates Mega Man in the ‘Bomberman’ stage. There were some failures, including a point at which he disappears. The original is shown on the left.

Their AI system watches less than two minutes of gameplay video and then builds its own model of how the game operates by studying the frames and making predictions of future events, such as what path a character will choose or how enemies might react.

To get their AI agent to create an accurate predictive model that could account for all the physics of a 2D platform-style game, the team trained the AI on a single “speedrunner” video, where a player heads straight for the goal. This made “the training problem for the AI as difficult as possible.”

“Our AI creates the predictive model without ever accessing the game’s code, and makes significantly more accurate future event predictions than those of convolutional neural networks,” says Matthew Guzdial, lead researcher and Ph.D. student in computer science. “A single video won’t produce a perfect clone of the game engine, but by training the AI on just a few additional videos you get something that’s pretty close.”

They next tested how well the cloned engine would perform in actual gameplay. They employed a second AI to play the game level and ensure the game’s protagonist wouldn’t fall through solid floors or go undamaged if hit by an enemy. The results: the AI playing with the cloned engine proved indistinguishable compared to an AI playing the original game engine.

A section of gameplay video (left) is produced by the original Super Mario Bros. engine, and the cloned engine (right) demonstrates the ability to accurately predict animation states.

‘A single video won’t produce a perfect clone of the game engine, but by training the AI on just a few additional videos you get something that’s pretty close.’ The game engine created with their system was more similar to the original than the same test done on a neural network, according to the researchers.

 

 

(source: GaTech blog, dailymail)

 

 

Generative Adversarial Networks (GAN)

Neural networks are used for recognizing pictures, understanding natural language with good accuracy, automate driving vehicles and a lot of other applications follow. But still, neural networks need human supervision to learn. Usually, a network needs labeled examples to learn effectively. While it’s also possible to learn from unlabeled data, this had typically not worked very well.

GANs was proposed by Ian Goodfellow (currently a staff research scientist at Google Brain). By applying game theory, he devised a way for a machine-learning system to effectively teach itself about how the world works. This ability could help make computers smarter by sidestepping the need to feed them painstakingly labeled training data. Generative Adversarial Networks (GANs) are neural networks that are trained in an adversarial manner to generate data mimicking some distribution.

To explain it in simpler terms: 

If you want to get better at something, say chess; what would you do? You would compete with an opponent better than you. Then you would analyze what you did wrong, what he/she did right, and think on what could you do to beat him/her in the next game.

You would repeat this step until you defeat the opponent. This concept can be incorporated to build better models. So simply, for getting a powerful hero (viz generator), we need a more powerful opponent (viz discriminator)!

To understand this deeply, first, you’ll have to understand what a generative and a discriminative model is.

In machine learning, the two main classes of models

  • Discriminative – A discriminative model is one that discriminates between two (or more) different classes of data – for example, a convolutional neural network that is trained to output 1 given an image of a human face and 0 otherwise.
  • Generative –  A generative model, on the other hand, doesn’t know anything about classes of data.  Instead, its purpose is to generate new data which fits the distribution of the training data – for example, a Gaussian Mixture Model is a generative model which, after trained on a set of points, is able to generate new random points which more-or-less fit the distribution of the training data (assuming a GMM is able to mimic the data well).

The generative network’s training objective is to increase the error rate of the discriminative network (i.e., “fool” the discriminator network by producing novel synthesized instances that appear to have come from the true data distribution). In practice, a known dataset serves as the initial training data for the discriminator. Training the discriminator involves presenting it with samples from the dataset until it reaches some level of accuracy. Typically the generator is seeded with a randomized input that is sampled from a predefined latent space (e.g. a multivariate normal distribution). Thereafter, samples synthesized by the generator are evaluated by the discriminator. Backpropagation is applied in both networks so that the generator produces better images, while the discriminator becomes more skilled at flagging synthetic images. The generator is typically a deconvolutional neural network, and the discriminator is a convolutional neural network.

Working of GANs explained:

So as we saw, there are two components in GANs

  1. Generator Neural Network
  2. Discriminator Neural Network

g1

The Generator Network takes a random input and tries to generate a sample of data. In the above image, we can see that generator G(z) takes an input z from p(z), where z is a sample from probability distribution p(z). It then generates a data which is then fed into a discriminator network D(x). The task of Discriminator Network is to take input either from the real data or from the generator and try to predict whether the input is real or generated. It takes an input x from pdata(x) where pdata(x) is our real data distribution. D(x) then solves a binary classification problem using sigmoid function giving output in the range 0 to 1.

Defining the notations used:

Pdata(x) -> the distribution of real data
X -> sample from pdata(x)
P(z) -> distribution of generator
Z -> sample from p(z)
G(z) -> Generator Network
D(x) -> Discriminator Network

Steps to train a GAN: 

Step 1: Define the problem. Do you want to generate fake images or fake texts? Here you should completely define the problem and collect the data for it.

Step 2: Define architecture of GAN. Define how your GAN should look like. Should both your generator and discriminator be multi layer perceptrons or convolutional neural networks? This step will depend on what problem you are trying to solve.

Step 3: Train Discriminator on real data for (n) epochs. Get the data you want to generate fake on and train the discriminator to correctly predict them as real. Here value (n) can be any natural number between 1 and infinity.

Step 4: Generate fake inputs for generator and train discriminator on fake data. Get generated data and let the discriminator correctly predict them as fake.

Step 5: Train generator with the output of discriminator. Now when the discriminator is trained, you can get its predictions and use it as an objective for training the generator. Train the generator to fool the discriminator.

Step 6: Repeat step 3 to step 5 for a few epochs.

Step 7: Check if the fake data manually if it seems legit. If it seems appropriate, stop training, else go to step 3. This is a bit of a manual task, as hand evaluation of the data is the best way to check the fakeness. When this step is over, you can evaluate whether the GAN is performing well enough.

The dueling-neural-network approach has vastly improved learning from unlabeled data. GANs can already perform some dazzling tricks. By internalizing the characteristics of a collection of photos, for example, a GAN can improve the resolution of a pixelated image. It can also dream up realistic fake photos, or apply a particular artistic style to an image. “You can think of generative models as giving artificial intelligence a form of imagination,” Goodfellow says.

 

 

(Sources: Wikipedia, MitTechReview, AnalyticsVidhya)

MIT uses AI to kill video buffering

We all hate it when our video is interrupted by buffering or its resolution has turned into a pixelated mess. A group of MIT researchers believe they’ve figured out a solution to those annoyances plaguing millions of people a day.

MIT discovered a way to improve video streaming by reducing buffering times and pixelation. A new AI developed at the university’s Computer Science and Artificial Intelligence Laboratory uses machine learning to pick different algorithms depending on network conditions. In doing so, the AI, called Pensieve, has been shown to deliver a higher-quality streaming experience with less buffering than existing systems.

Instead of having a video arrive at your computer in one complete piece, sites like YouTube and Netflix break it up into smaller pieces and sends them sequentially, relying on ABR algorithms to determine which resolution each piece will play at. This is an attempt to give users a more consistent viewing experience while also saving bandwidth, but it created problems. If the connection is too slow, YouTube may temporarily lower the resolution – pixelating the video- to keep it playing. And since the video is sent in pieces, skipping ahead is impossible.

There are two types of ABR: a rate-based one that measures how fast a network can transmit data and a buffer-based one tasked with maintaining a sufficient buffer at the head of the video.  The current algorithms only consider one of these factors, but MIT’s new algorithm Pensieve uses machine learning to choose the best system based on the network condition

In experiments that tested the AI using wifi and LTE, the team found that it could stream video at the same resolution with 10 to 30 percent less rebuffering than other approaches. Additionally, users rated the video play with the AI 10 to 25 percent higher in terms of ‘quality of experience.’  The researchers, however, only tested Pensieve on a month’s worth of downloaded video and believe performance would be even higher with a number of data streaming giants YouTube and Netflix have.

As a next project, the team will be working to test Pensieve on virtual-reality (VR) video.“The bitrates you need for 4K-quality VR can easily top hundreds of megabits per second, which today’s networks simply can’t support,” Alizadeh says. “We’re excited to see what systems like Pensieve can do for things like VR. This is really just the first step in seeing what we can do.”

Pensieve was funded, in part, by the National Science Foundation and an innovation fellowship from Qualcomm.

Open AI’s bots beat world’s top Dota 2 players

Open AI has created a bot that is taking down top Dota 2 players at The International Dota event held by Valve.

This year’s big Dota 2 tournament is currently in its second to last day with only four teams remaining in the competition. Between the event’s official games, Valve always includes some fun show matches, and this year it brought in Open AI‘s aforementioned Dota 2 b to face off against the game’s most famous player, Danil “Dendi” Ishutin, live on the main stage.

Open AI said the 1v1 bot has learned from playing against itself in a lifetimes worth of matches. In a promo video shown before the match, we see the bot besting some of the game’s top players, including Evil Genius’ cores Arteezy and Sumail.

In the live Shadow Fiend mirror match against Dendi, the bot executed an impossibly perfect creep block, perfectly balanced creep aggro, and even recognized and canceled Dendi’s healing items. The bot quickly bested Dendi in two matches, leading Dendi to say it’s “too strong!” Footage of those matches can be found on Dota 2 Rapier’s YouTube channel.

Open AI says their next goal is to get the bot ready for the infinitely more complex 5v5 matches, noting it might have something ready for next year.

How DeepMind’s AI taught itself to walk

DeepMind’s programmers have given the agent a set of virtual sensors (so it can tell whether it’s upright or not, for example) and then incentivize to move forward. The computer works the rest out for itself, using trial and error to come up with different ways of moving. True motor intelligence requires learning how to control and coordinate a flexible body to solve tasks in a range of complex environments. Existing attempts to control physically simulated humanoid bodies come from diverse fields, including computer animation and biomechanics.  A trend has been to use hand-crafted objectives, sometimes with motion capture data, to produce specific behaviors. However, this may require considerable engineering effort and can result in restricted behaviors or behaviors that may be difficult to repurpose for new tasks.

True motor intelligence requires learning how to control and coordinate a flexible body to solve tasks in a range of complex environments. Existing attempts to control physically simulated humanoid bodies come from diverse fields, including computer animation and biomechanics.  A trend has been to use hand-crafted objectives, sometimes with motion capture data, to produce specific behaviors. However, this may require considerable engineering effort and can result in restricted behaviors or behaviors that may be difficult to repurpose for new tasks.

 

DeepMind published 3 papers which are as follows:

Emergence of locomotion behaviors in rich environments:- 

For some AI problems, such as playing Atari or Go, the goal is easy to define – it’s winning. But describing a process such as a jog, a backflip or a jump is difficult because of accurately describing a complex behavior which is a common problem when teaching motor skills to an artificial system. DeepMind explored how sophisticated behaviors can emerge from scratch from the body interacting with the environment using only simple high-level objectives, such as moving forward without falling. DeepMind trained agents with a variety of simulated bodies to make progress across diverse terrains, which require jumping, turning and crouching. The results show that their agents developed these complex skills without receiving specific instructions, an approach that can be applied to train systems for multiple, distinct simulated bodies.

wall

Simulated planar Walker attempts to climb a wall

But how do you describe the process for performing a backflip? Or even just a jump? The difficulty of accurately describing a complex behavior is a common problem when teaching motor skills to an artificial system. In this work, we explore how sophisticated behaviors can emerge from scratch from the body interacting with the environment using only simple high-level objectives, such as moving forward without falling. Specifically, we trained agents with a variety of simulated bodies to make progress across diverse terrains, which require jumping, turning and crouching. The results show our agents develop these complex skills without receiving specific instructions, an approach that can be applied to train our systems for multiple, distinct simulated bodies. The GIFs show how this technique can lead to high-quality movements and perseverance.

Learning human behaviors from motion capture by adversarial imitation:-

walk.gif

A humanoid Walker produces human-like walking behavior

The emergent behavior described above can be very robust, but because the movements must emerge from scratch, they often do not look human-like.

DeepMind in their second paper demonstrated how to train a policy network that imitates motion capture data of human behaviors to pre-learn certain skills, such as walking, getting up from the ground, running, and turning.

Having produced behavior that looks human-like, they can tune and repurpose those behaviors to solve other tasks, like climbing stairs and navigating walled corridors.

 

Robust imitation of diverse behaviors:-

div.gif

The planar Walker on the left demonstrates a particular walking style and the agent in the right panel imitates that style using a single policy network.

The third paper proposes a neural network architecture, building on state-of-the-art generative models, that is capable of learning the relationships between different behaviors and imitating specific actions that it is shown. After training, their system can encode a single observed action and create a new novel movement based on that demonstration.After training, their system can encode a single observed action and create a new novel movement based on that demonstration.It can also switch between different kinds of behaviors despite never having seen transitions between them, for example switching between walking styles.

 

 

(via The Verge, DeepMind Blog)

 

 

MIT’s new prototype ‘3D’ Chip

Daily a plethora of data is being generated and the computing power to process this data into useful information is stalling. One of the fundamental problems being faced is the processor-memory bottleneck or the performance gap. Various methods such as caches and different software techniques have been used to eliminate this problem. But there’s another way which is, building the CPU directly into a 3D memory structure, connect them directly without any kind of motherboard traces, and compute from within the RAM itself.

3d.jpeg

 

A prototype chip built by researchers at Stanford and MIT can solve the problem by sandwiching the memory, processor and even sensors all into one unit. While current chips are made of silicon

.

rram.jpg

The researchers have developed a new 3D chip fabrication method that uses carbon nanotubes and resistive random-access memory (RRAM) cells together to create a combined nanoelectronic processor design that supports complex, 3D architecture – where traditional silicon-based chip fabrication works with 2D structures only. The team claims this makes for “the most complex nanoelectronic system ever made with emerging nanotechnologies,” creating a 3D computer architecture. Using carbon makes the whole thing possible since higher temperatures required to make a silicon CPU would damage the sensitive RRAM cells.

The 3D design is possible because these carbon nanotube circuits and RRAM memory components can be made using temperatures below 200 degrees Celsius, which is far, far less than the 1,000 degree temps needed to fabricate today’s 2D silicon transistors. Lower temperatures mean you can build an upper layer on top of another without damaging the one or ones below.

One expert cited by MIT said that this could be the answer to continuing the exponential scaling of computer power in keeping with Moore’s Law, as traditional chip methods start to run up against physical limits. It’s still in its initial phases and it would take many years till we see the actual implementation of these chips in real life.

Response of an artificial iris to light like the human eye

An artificial iris manufactured from intelligent, light-controlled polymer material can react to incoming light in the same ways as the human eye. The Iris was developed by the Smart Photonic Materials research group from the TUT, and it was recently published in the Advanced Materials journal.

iris.jpg

The human iris does its job of adjusting your pupil size to meter the amount of light hitting the retina behind without you having to actively think about it. And while a camera’s aperture is designed to work the same way as a biological iris, it’s anything but automatic. Even point-and-shoots rely on complicated control mechanisms to keep your shots from becoming overexposed. But a new “artificial iris” developed at the Tampere University of Technology in Finland can autonomously adjust itself based on how bright the scene is.

Scientists from the Smart Photonic Materials research group developed the iris using a light-sensitive liquid crystal elastomer. The team also employed photoalignment techniques, which accurately position the liquid crystal molecules in a predetermined direction within a tolerance of a few picometers. This is similar to the techniques used originally in LCD TVs to improve viewing angle and contrast but has since been adopted to smartphone screens. “The artificial iris looks a little bit like a contact lens,” TUT Associate Professor Arri Priimägi said. “Its center opens and closes according to the amount of light that hits it.”

The team hopes to eventually develop this technology into an implantable biomedical device. However, before that can happen, the TUT researchers need to first improve the iris’ sensitivity so that it can adapt to smaller changes in brightness. They also need to get it to work in an aqueous environment. However, this new iris is therefore still long ways away from being ready.