Inside the mind of a Neural Network

Neural networks, which learn to perform tasks such as including speech-recognition, automatic-translation systems, image recognition systems and time series prediction by analyzing huge sets of training data, have been responsible for the most impressive recent advances in artificial intelligence.

During training, however, a neural net continually adjusts its internal settings in ways that even its creators can’t interpret. Much recent work in computer science has focused on clever techniques for determining just how neural nets do what they do. Neural nets are so named because they roughly approximate the structure of the human brain. Typically, they’re arranged into layers, and each layer consists of many simple processing units — nodes — each of which is connected to several nodes in the layers above and below. Data are fed into the lowest layer, whose nodes process it and pass it to the next layer. The connections between layers have different “weights,” which determine how much the output of any one node figures into the calculation performed by the next.

In the case of the speech recognition network, Belinkov and Glass used individual layers’ outputs to train a system to identify “phones,” distinct phonetic units particular to a spoken language. The “t” sounds in the words “tea,” “tree,” and “but,” for instance, might be classified as separate phones, but a speech recognition system has to transcribe all of them using the letter “t.” And indeed, Belinkov and Glass found that lower levels of the network were better at recognizing phones than higher levels, where, presumably, the distinction is less important [MIT].

An important part of the neural network is forgetting. An ANN must know, which part to forget and which part not to. The basic algorithm used in the majority of deep-learning procedures to tweak neural connections in response to data is called “stochastic gradient descent”: Each time the training data are fed into the network, a cascade of firing activity sweeps upward through the layers of artificial neurons. When the signal reaches the top layer, the final firing pattern can be compared to the correct label for the image—1 or 0, “dog” or “no dog.” Any differences between this firing pattern and the correct pattern are “back-propagated” down the layers, meaning that, like a teacher correcting an exam, the algorithm strengthens or weakens each connection to make the network layer better at producing the correct output signal. Over the course of training, common patterns in the training data become reflected in the strengths of the connections, and the network becomes expert at correctly labeling the data, such as by recognizing a dog, a word, or a 1.



sources: MIT blog, Wired, arxiv

AI creates Cartoons from Text

Animation as well know is a great source of entertainment and is difficult to create and stitch them together. It is now possible to stitch cartoons with the help of AI. The researchers at the Allen Institute of Technology and the University of Illinois, have managed to develop an AI called Craft (Composition, Retrieval, and Fusion Network), which does some part of the tedious work. The AI creates scenes from text descriptions. The AI system, Craft was able to recreate the Flintstones cartoon, by being trained on 25000 3 second clips and a brief text description of that scene.

The AI works by matching the videos to the brief word-descriptions and builds a set of parameters. Craft can convert the provided text descriptions into video clips and animated series. It can not only put the characters into place but also parse objects, retrieve the background, etc.


The results are still raw in nature, and there is a lot of scope for improving the model and training it on other cartoons. The video below briefly describes how the AI works.

For detailed information, refer the paper.

SqueezeNext – Hardware Aware Neural Network Design

Berkeley researchers have published ‘SqueeseNext’, the successor to SqueezeNet, in their latest attempt to distill the capabilities of very large neural networks into smaller models that can feasibly be deployed on devices with small memory and compute capabilities, like mobile phones. One of the main barriers for deploying neural networks on embedded systems has been large memory and power consumption of existing neural networks. While much of the research into AI systems today is based around getting state-of-the-art results on specific datasets, SqueezeNext is part of a parallel track focused on making systems deployable.

Need for SqueezeNext?

The transition to Deep Neural Network based solutions started with AlexNet, which won the ImageNet challenge by a large margin. The ImageNet classification challenge started in 2010 with the first winning method achieving an error rate of 28.2%, followed by 26.2% in 2011. However, a clear improvement in accuracy was achieved by AlexNet with an error rate of 16.4%, a 10% margin with the runner-up. AlexNet consists of five convolutional, and three fully connected layers. The network contains a total of 61 million parameters. Due to this large size of the network, the original model had to be trained on two GPUs with a model parallel approach, where the filters were distributed to these GPUs. Moreover, dropout was required to avoid overfitting using such a large model size. These model sizes have millions of parameters and are not suitable for real-time applications!

SqueezeNet is a successful example which achieves AlexNet’s accuracy with 50× fewer parameters without compression and 500× smaller with deep compression. SqueezeNext is efficient because of a few design strategies: low-rank filters; a bottleneck filter to constrain the parameter count of the network; using a single fully connected layer following a bottleneck; weight and output stationary; and co-designing the network in tandem with a hardware simulator to maximize hardware usage efficiency.

The resulting SqueezeNext network is a neural network with 112X fewer model parameters than those found in AlexNet. They also develop a version of the network whose performance approaches that of VGG-19 (which did well in ImageNet 2014). The researchers also design an even more efficient network by carefully tuning model design in parallel with a hardware simulator, ultimately designing a model that is significantly faster and more energy efficient than a widely used compressed network called SqueezeNet.


(via: SquezeeNext)


Deep Reinforcement Learning and Q-Learning

Deep Learning  (neural networks) are used to achieve the state of the result for image recognition, computer vision, machine translation, etc,. Reinforcement learning refers to goal-oriented algorithms, which learn how to attain a complex objective (goal) or maximize along a particular dimension over many steps; for example, maximize the points won in a game over many moves. They can start with a blank slate, and under the right conditions, they achieve superhuman performance. Like a child incentivized by spankings and candy, these algorithms are penalized when they make the wrong decisions and rewarded when they make the right ones – this is reinforcement.

Like a human, agents learn for themselves to achieve successful strategies that lead to the greatest long-term rewards. This paradigm of learning by trial-and-error, solely from rewards or punishments, is known as reinforcement learning (RL). Reinforcement algorithms that incorporate deep learning can beat world champions at the game of Go as well as human experts playing numerous Atari video games. While that may sound trivial, it’s a vast improvement over their previous accomplishments, and the state of the art is progressing rapidly.

Reinforcement learning solves the difficult problem of correlating immediate actions with the delayed returns they produce. Like humans, reinforcement learning algorithms sometimes have to wait a while to see the fruit of their decisions. They operate in a delayed return environment, where it can be difficult to understand which action leads to which outcome over many time steps.

Q – Learning

Q-learning is a model-free reinforcement learning technique. It is able to compare the expected utility of the available actions (for a given state) without requiring a model of the environment. Additionally, Q-learning can handle problems with stochastic transitions and rewards, without requiring adaptations. It has been proven that for any finite Markov decision process (MDP), Markov decision processes (MDPs) provide a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying a wide range of optimization problems solved via dynamic programming and reinforcement learning. Q-learning eventually finds an optimal policy, in the sense that the expected value of the total reward return over all successive steps, starting from the current state, is the maximum achievable.


(via Wiki, GoogleDeepmind Blog, DeepLearning4j, IntelAI)

Google’s Bristlecone Quantum Computing Chip

A few days ago Google previewed its new quantum processor, Bristlecone, a quantum computing chip that will serve as a testbed for research regarding the system error rates and scalability of Google’s qubit technology. In a post on its research blog, Google said it’s “cautiously optimistic that quantum supremacy can be achieved with Bristlecone.”


72 Qubit Quantum Computing Chip

The purpose of this gate-based superconducting system is to provide a testbed for research into system error rates and scalability of their qubit technology, as well as applications in quantum simulation, optimization, and machine learning. Qubits (the quantum version of traditional bits) are very unstable and can be adversely affected by noise, and most of these systems can only hold a state for less than 100 microseconds. Google believes that quantum supremacy can be “comfortably demonstrated” with 49 qubits and a two-qubit error below 0.5 percent. Previous quantum systems by Google have given two-qubit errors of 0.6 percent, which in theory sounds like an extremely small difference, but in the world of quantum computing remains significant.

However, each Bristlecone chip features 72 qubits, which may help mitigate some of this error, but as Google says, quantum computing isn’t just about qubits. Until now, the most advanced quantum chip, built by IBM, had 50 qubits.  “Operating a device such as Bristlecone at low system error requires harmony between a full stack of technology ranging from software and control electronics to the processor itself,” the team writes in a blog post. “Getting this right requires careful systems engineering over several iterations.”

(via Google Research Blog, Engadget, Forbes)



Deep learning improves your computer with age

The researchers at Google have devised a new technique that could let a laptop or smartphone learn to do things better and faster over time. The researchers published a paper which focuses on a common problem. The prefetching problem. Computers process information much faster than they can pull it from memory to be processed. To avoid bottlenecks, they try to predict which information is likely to be needed and pull it in advance. As computers get more powerful, this prediction becomes progressively harder.

In the paper published, Google focuses on using deep learning to improve prefetching. “The work that we did is only the tip of the iceberg,” says Heiner Litz of the University of California, Santa Cruz, a visiting researcher on the project. Litz believes it should be possible to apply machine learning to every part of a computer, from the low-level operating system to the software that users interact with.

Such advances would be opportune. Moore’s Law is finally slowing down, and the fundamental design of computer chips hasn’t changed much in recent years. Tim Kraska, an associate professor at MIT who is also exploring how machine learning can make computers work better, says the approach could be useful for high-level algorithms, too. A database might automatically learn how to handle financial data as opposed to social-network data, for instance. Or an application could teach itself to respond to a particular user’s habits more effectively.

Paper reference:


(via: MitTechReview)

Baidu Research’s New AI Algorithm Mimics Voice With Very Few Samples

AI typically needs a plethora of data and a lot of time for something like voice cloning. It needs to listen to hours of recordings. However, a new process could get that down to one minute. Baidu researchers have unveiled an upgraded version of Deep Voice, their text-to-speech synthesis system, that can now, once trained, clone any voice after listening to a few snippets of audio. This capability was enabled by learning shared and discriminative information from speakers. Baidu calls this ‘Voice Cloning’. Voice cloning is expected to have significant applications in the direction of personalization in human-machine interfaces.


Here, Baidu focuses on two fundamental approaches (refer above figure):

  1. Speaker AdaptionSpeaker adaptation is based on fine-tuning a multi-speaker generative model with a few cloning samples, by using backpropagation-based optimization. Adaptation can be applied to the whole model or only the low-dimensional speaker embeddings. The latter enables a much lower number of parameters to represent each speaker, albeit it yields a longer cloning time and a lower audio quality.
  2. Speaker EncodingSpeaker encoding is based on training a separate model to directly infer a new speaker embedding from cloning audios that will ultimately be used with a multi-speaker generative model. The speaker encoding model has time-and-frequency-domain processing blocks to retrieve speaker identity information from each audio sample, and attention blocks to combine them in an optimal way.

For detailed information and mathematical explanations, refer the paper by Baidu Research.

However, this technology can also possibly have a downside as this could be tumultuous to people relying upon biometric voice security.

( via MitTechReview, Wiki, BaiduResearch)

Interpretable Machine Learning Through Teaching – (OpenAI)

The researchers at OpenAI have designed a method that encourages AIs to teach each other with examples that are cogent to human beings as well. Their method automatically selects the most informative examples for teaching a concept, for example, the best images to describe the concept of dogs, and this approach proved to be effective for both humans as well as AIs.

OpenAI envisions that some of the most impactful applications of AI will come from a result of collaboration between humans and machines. However, communication between the two is the barrier. Consider an example. Think about trying to guess the shape of a rectangle when you’re only shown a collection of random points inside that rectangle: it’s much faster to figure out the correct dimensions of the rectangle when you’re given points at the corners of the rectangle instead. OpenAI’s machine learning approach works as a cooperative game played between two agents, one the teacher and another the student. The goal here for the student is to guess a particular concept (i.e. “dog”, “zebra”) based on examples of that concept (such as images of dogs), and the goal of the teacher is to learn to select the most illustrative examples for the student.

In their two-stage technique: 

  1. A ‘student’ neural network is given randomly selected input examples of concepts and is trained from those examples using traditional supervised learning methods to guess the correct concept labels.
  2. The ‘teacher’ network — which has an intended concept to teach and access to labels linking concepts to examples — to test different examples on the student and see which concept labels the student assigns them, eventually converging on the smallest set of examples it needs to give to let the student guess the intended concept.

However, if they train the student and the teacher jointly, the student and teacher can collude to communicate via arbitrary examples that do not make sense to humans, digressing from the main goal.


(via: OpenAI)

Artificial Synapse could make Brain-On-A-Chip Hardware a Reality

Let’s start by understanding what does the title mean! This is a part of Neuromorphic Engineering aka Neuromorphic Computing, describing the use of very-large-scale integration (VLSI) systems containing electronic analog circuits to mimic neuro-biological architectures present in the nervous system. Microprocessors configured more like brains than traditional chips could soon make computers far more astute about what’s going on around them.


Neuromorphic computer chips are designed to work like the human brain. Instead of being controlled by binary, on-or-off signals like most current chips, neuromorphic chips weight their outputs, mimicking the way different neurons fire at different strengths through their synapses. In this way, small neuromorphic chips could, like the brain, efficiently process millions of streams of parallel computations that are currently only possible with large banks of supercomputers. But one significant hangup on the way to such portable artificial intelligence has been the neural synapse, which has been particularly tricky to reproduce in hardware.

Now engineers at MIT have designed an artificial synapse in such a way that they can precisely control the strength of an electric current flowing across it, similar to the way ions flow between neurons. The team has built a small chip with artificial synapses, made from silicon germanium. In simulations, the researchers found that the chip and its synapses could be used to recognize samples of handwriting, with 95 percent accuracy. The design, published last month in the journal Nature Materials, is a major step towards building portable, low-power neuromorphic chips for use in pattern recognition and other learning tasks.

Most neuromorphic chip designs attempt to emulate the synaptic connection between neurons using two conductive layers separated by a “switching medium,” or synapse-like space. When a voltage is applied, ions should move in the switching medium to create conductive filaments, similarly to how the “weight” of a synapse changes.

The research was led by Jeehwan Kim, the Class of 1947 Career Development Assistant Professor in the departments of Mechanical Engineering and Materials Science and Engineering, and a principal investigator in MIT’s Research Laboratory of Electronics and Microsystems Technology Laboratories.

In conclusion, Artificial neural networks are already loosely modeled on the brain. The combination of neural nets and neuromorphic chips could let AI systems be packed into smaller devices and run a lot more efficiently.


(via ScienceDaily, MitTechReview, Wiki)

Google’s Cloud Auto-ML Vision

A new service by Google named Cloud AutoML uses several machine-learning tricks to automatically build and train a deep-learning algorithm that can recognize things in images. The initial release of AutoML Cloud is limited to image recognition. Its simple interface lets you upload images with ease, train and manage them, and finally deploy models on Google Cloud.

The technology is limited for now, but it could be the start of something big. Building and optimizing a deep neural network algorithm normally requires a detailed understanding of the underlying math and code, as well as extensive practice tweaking the parameters of algorithms to get things just right. The difficulty of developing AI systems has created a race to recruit talent, and it means that only big companies with deep pockets can usually afford to build their own bespoke AI algorithms.


In addition, rather than forcing enterprises to train their algorithms using Google’s data, Cloud AutoML ingests enterprise data assets and tunes the model accordingly. The key here is that Google helps enterprises to customize a model without having to do so de novo: There’s already a great deal of training baked in. Though initially focused on image data, Google plans to roll out the service to tackle text, video, and more.

Cloud AutoML Vision is built on Google’s transfer learning and neural architecture search technologies (among others). Disney has already started using the technology to annotate their products to improve the customer’s experience on their shop-Disney site. The Zoological Society of London is also using AutoML Vision to recognize and track wildlife in order to understand their distribution and how humans are impacting the species.

The video below simplifies and formulates the working of Cloud AutoML Vision.