Neural Networks Are Learning What (Not) To Forget

Deep Learning, which makes use of various types of Artificial Neural Networks, is the paradigm that has the potential to grow AI at an exponential rate. There’s a lot of hype going on around Neural Nets and people are busy in researching and making them useful to their potential.

The title of this blog might seem obscure to some of the readers but consider an example, humans have evolved from apes, and while evolving, we’ve forgotten some information or knowledge or a particular experience which is no longer useful or needed. In particular, humans have the extraordinary ability to constantly update their memories with the most important knowledge while overwriting information that is no longer useful. The world is full of trivial experiences or provides knowledge which is irrelevant or does little help to us, hence we forget those things or rather ignore those incidences.

The same cannot be said of machines. Any skill they learn is quickly overwritten, regardless of how important it is. There is currently no reliable mechanism they can use to prioritize these skills, deciding what to remember and what to forget. Thanks to Rahaf Aljundi and pals at the University of Leuven in Belgium and at Facebook AI Research. These guys have shown that the approach biological systems use to learn and to forget, can work with artificial neural networks too.

The key is a process known as Hebbian learning, first proposed in the 1940s by the Canadian psychologist Donald Hebb to explain the way brains learn via synaptic plasticity. Hebb’s theory can be famously summarized as “Cells that fire together wire together.”

What is Hebbian Learning?

Hebbian learning is one of the oldest learning algorithms and is based in large part on the dynamics of biological systems. A synapse between two neurons is strengthened when the neurons on either side of the synapse (input and output) have highly correlated outputs. In essence, when an input neuron fires, if it frequently leads to the firing of the output neuron, the synapse is strengthened. Following the analogy to an artificial system, the tap weight is increased with high correlation between two sequential neurons.

hebbian learning

Artificial neural nets being taught what to forget

In other words, the connections between neurons grow stronger if they fire together, and these connections are therefore more difficult to break. This is how we learn—repeated synchronized firing of neurons makes the connections between them stronger and harder to overwrite. The team did this by measuring the outputs from a neural network and monitoring how sensitive they are to changes in the connections within the network. This gave them a sense of which network parameters are most important and should, therefore, be preserved. “When learning a new task, changes to important parameters are penalized,” said the team. They say the resulting network has “memory aware synapses.”

Neural networks with memory aware synapses turn out to perform better in these tests than other networks. In other words, they preserve more of the original skill than networks without this ability, although the results certainly allow room for improvement.

The Paper: : Memory Aware Synapses: Learning What (Not) To Forget.

(via: MitTechReview, Wikipedia, arxiv)


Evolution of Poker with AI

Artificial Intelligence is happening all around us. With many businesses integrating AI in their technologies, there is an outburst of AI everywhere. From self-driving cars to GO playing AI, and most surprising of all, Poker! A few months ago, I had shared a blog on The Poker Playing AI (recommended read!). Well yeah, to those folks who didn’t know, AI has also kept its feet in Poker and it’s a major achievement!

To summarize the victory of AI in poker, a beautiful infographic has been created and designed meticulously by the folks at pokersites.



Source of the Infographic ->

What is Ethereum?

With all the hype going around Blockchain and it’s infrastructure users like Bitcoin and Ethereum are all over the media and magazines I had to write a blog on Ethereum. Briefly, Blockchain is to Bitcoin, what the internet is to email. A big electronic system, on top of which you can build applications.

Then what’s Ethereum? It is a public database that keeps a permanent record of digital transactions. Importantly, this database doesn’t require any central authority to maintain and secure it.


Like Bitcoin, Ethereum is a distributed public blockchain network. Although there are some significant technical differences between the two, the most important distinction to note is that Bitcoin and Ethereum differ substantially in purpose and capability. Bitcoin offers one particular application of blockchain technology, a peer to peer electronic cash system that enables online Bitcoin payments. While the Bitcoin blockchain is used to track ownership of digital currency (bitcoins), the Ethereum blockchain focuses on running the programming code of any decentralized application.

In the Ethereum blockchain, instead of mining for bitcoin, miners work to earn Ether, a type of crypto token that fuels the network. Beyond a tradeable cryptocurrency, Ether is also used by application developers to pay for transaction fees and services on the Ethereum network.

The Ethereum blockchain is essentially a transaction-based state machine, which means that on a series of specific inputs, it will transition to a new state. The first state in Ethereum is the genesis state, which means a blank state, wherein no transactions have happened on the network. After a transaction occurs, the genesis state transitions to a new state, possibly a final state denoting the current state of Ethereum. Of course, there are millions of transactions occurring concurrently, whereby these transactions are grouped in Blocks. So to simplify, a Block contains a series of transactions.

To cause a transition from one state to the next, a transaction must be valid. For a transaction to be considered valid, it must go through a validation process known as miningMining is when a group of nodes (i.e. computers) expend their compute resources to create a block of valid transactions. In the most basic sense, a transaction is a cryptographically signed piece of instruction that is generated by an externally owned account, serialized, and then submitted to the blockchain.

Any node on the network that declares itself as a miner can attempt to create and validate a block. Lots of miners from around the world try to create and validate blocks at the same time. Each miner provides a mathematical “proof” when submitting a block to the blockchain, and this proof acts as a guarantee: if the proof exists, the block must be valid.

For a block to be added to the main blockchain, the miner must prove it faster than any other competitor miner. The process of validating each block by having a miner provide a mathematical proof is known as a “proof of work.”

A miner who validates a new block is rewarded with a certain amount of value for doing this work. What is that value? The Ethereum blockchain uses an intrinsic digital token called “Ether.” The value token of the Ethereum blockchain is called ether. It is listed under the code ETH and traded on cryptocurrency exchanges. It is also used to pay for transaction fees and computational services on the Ethereum network. Every time a miner proves a block, new Ether tokens are generated and awarded. Ether can be transferred between accounts and used to compensate participant nodes for computations performed.

EVM – Ehtereum Virtual Machine

Ethereum is a programmable blockchain. Rather than give users a set of pre-defined operations (e.g. bitcoin transactions), Ethereum allows users to create their own operations of any complexity they wish. In this way, it serves as a platform for many different types of decentralized blockchain applications, including but not limited to cryptocurrencies.

Ethereum in the narrow sense refers to a suite of protocols that define a platform for decentralized applications. At the heart of it is the Ethereum Virtual Machine (“EVM”), which can execute code of arbitrary algorithmic complexity. In computer science terms, Ethereum is “Turing complete”. Developers can create applications that run on the EVM using friendly programming languages modeled on existing languages like JavaScript and Python.

Smart Contracts

Smart contracts are deterministic exchange mechanisms controlled by digital means that can carry out the direct transaction of value between untrusted agents. They can be used to facilitate, verify, and enforce the negotiation or performance of economically-laden procedural instructions and potentially circumvent censorship, collusion, and counter-party risk. In Ethereum, smart contracts are treated as autonomous scripts or stateful decentralized applications that are stored in the Ethereum blockchain for later execution by the EVM. Instructions embedded in Ethereum contracts are paid for in ether (or more technically “gas”) and can be implemented in a variety of Turing complete scripting languages.

There’s still a lot more to Ethereum, but this blog will help you get some insights before delving in deeper in the utopian world of blockchain and cryptocurrency.


(sources: Wiki, Telegraph, Medium, ethdocs)

Quantum Network

We all know the hype that’s currently running around Quantum Computer, their working and a plethora of discoveries happening in Quantum Computing. Recommended: Quantum ComputingQuantum Computer Memories.

Recently, there has been news about a Quantum Network or rather, Quantum Internet. So what’s Quantum Network? Quantum networks form an important element of quantum computing and quantum communication systems. In general, quantum networks allow for the transmission of quantum information (quantum bits, also called qubits), between physically separated quantum processors. A quantum processor is a small quantum computer being able to perform quantum logic gates on a certain number of qubits.


Being able to send qubits from one quantum processor to another allows them to be connected to form a quantum computing cluster. This is often referred to as networked quantum computing or distributed quantum computing. Here, several less powerful quantum processors are connected together by a quantum network to form one much more powerful quantum computer. This is analogous to connecting several classical computers to form a computer cluster in classical computing. Networked quantum computing offers a path towards scalability for quantum computers since more and more quantum processors can naturally be added over time to increase the overall quantum computing capabilities. In networked quantum computing, the individual quantum processors are typically separated only by short distances.

Going back a year, Chinese physicists launched the world’s first quantum satellite. Unlike the dishes that deliver your Television shows, this 1,400-pound behemoth doesn’t beam radio waves. Instead, the physicists designed it to send and receive bits of information encoded in delicate photons of infrared light. It’s a test of a budding technology known as quantum communications, which experts say could be far more secure than any existing info relay system. If quantum communications were like mailing a letter, entangled photons are kind of like the envelope: They carry the message and keep it secure. Jian-Wei Pan of the University of Science and Technology of China, who leads the research on the satellite, has said that he wants to launch more quantum satellites in the next five years.

The basic structure of a quantum network and more generally a quantum internet is analogous to classical networks. First, we have end nodes on which applications can ultimately be run. These end nodes are quantum processors of at least one qubit. Some applications of a quantum internet require quantum processors of several qubits as well as a quantum memory at the end nodes.

Second, to transport qubits from one node to another, we need communication lines. For the purpose of quantum communication, standard telecom fibers can be used. For networked quantum computing, in which quantum processors are linked at short distances, one typically employs different wavelength depending on the exact hardware platform of the quantum processor.

Third, to make maximum use of communication infrastructure, one requires optical switches capable of delivering qubits to the intended quantum processor. These switches need to preserve quantum coherence, which makes them more challenging to realize than standard optical switches.

Finally, to transport qubits over long distances one requires a quantum repeater. Since qubits cannot be copied, classical signal amplification is not possible and a quantum repeater works in a fundamentally different way than a classical repeater.

The quantum internet could also be useful for potential quantum computing schemes, says Fu. Companies like Google and IBM are developing quantum computers to execute specific algorithms faster than any existing computer. Instead of selling people personal quantum computers, they’ve proposed putting their quantum computer in the cloud, where users would log into the quantum computer via the internet. While running their computations, they might want to transmit quantum-encrypted information between their personal computer and the cloud-based quantum computer. “Users might not want to send their information classically, where it could eavesdrop,” Fu says. But still, this technology is almost 13 years away and surely we will witness a lot more discoveries in the coming years.

(source : MitTechReview, Wikipedia, Wired)




DCGANs stand for Deep Convolutional Generative Adversarial Networks. It is quite the contrary to a Convolutional Neural Network (CNN). It works in an opposite direction compared to a CNN. What CNN does is that it transforms an image to class labels, that is a list of probabilities, whereas DCGAN generates an image from random parameters.



Some of you might wonder what are convolutions. Convolutions are the operations or the modifications we perform on an image. We perform modifications on the image kernel by multiplying it with the matrix of the operation we want to perform. For detailed information on image kernel and convolutions please visit here.

Moving on, what CNN does is it applies a lot of filters to extract various features from a single image. CNN applies multi-layered filters to a single image to extract features moving deeper into the layers.


Now the typical working on CNNs is that it starts from a single RGB image on the right, multiple filtering layers are applied to produce smaller and large number of images.


Flow of CNNs


Image generation flow of DCGANs


Flow of DCGANs


Now, the filters that we previously mentioned are convolutional in CNNs and transposed-convolutional in DCGANs, both of them work in the opposite direction.


Illustration of their working

Convolution: (Bottom Up) 3×3 blue pixels contribute to generating a single green pixel. Each of 3×3 blue pixels multiplied by the corresponding filter value, and the results from different blue pixels summed up to be a single green pixel.

Transposed-Convolutions: (Top Down) A single green pixel contributes to generating 3×3 blue pixels. Each green pixel is multiplied by each 3×3 filter values and the results from different green pixels are summed up to be a single blue pixel.


It is suggested that the input parameters could use a semantic structure as in the following example-


Interpretation of Input Parameters


Training Strategies:

CNN: Classifying authentic and fake images. Here, authentic images are provided as training data to the model.

DCGAN: It is trained to generate images classified as authentic by the CNN. It works by trying to fool the CNN, DCGAN learns to generate images similar to the training data.


What is Meta-Learning in Machine Learning?

Meta-Learning is a subfield of machine learning where automatic learning algorithms are applied on meta-data. In brief, it means Learning to Learn. The main goal is to use meta-data to understand how automatic learning can become flexible in solving different kinds of learning problems, hence to improve the performance of existing learning algorithms. Which means that how effectively we can increase the learning rate of our algorithms.

Meta-Learning affects the hypothesis space for the learning algorithm by either:

  • Changing the hypothesis space of the learning algorithms (hyper-parameter tuning, feature selection)
  • Changing the way the hypothesis space is searched by the learning algorithms (learning rules)

Variations of Meta-Learning: 

  • Algorithm Learning (selection) – Select learning algorithms according to the characteristics of the instance.
  • Hyper-parameter Optimization – Select hyper-parameters for learning algorithms. The choice of the hyper-parameters influences how well you learn.
  • Ensemble Methods – Learn to learn “collectively” – Bagging, Boosting, Stacked Generalization.

Flexibility is very important because each learning algorithm is based on a set of assumptions about the data, its inductive bias. This means that it will only learn well if the bias matches the data in the learning problem. A learning algorithm may perform very well on one learning problem, but very badly on the next. From a non-expert point of view, this poses strong restrictions on the use of machine learning or data mining techniques, since the relationship between the learning problem (often some kind of database) and the effectiveness of different learning algorithms is not yet understood.

By using different kinds of meta-data, like properties of the learning problem, algorithm properties (like performance measures), or patterns previously derived from the data, it is possible to select, alter or combine different learning algorithms to effectively solve a given learning problem. Critiques of meta-learning approaches bear a strong resemblance to the critique of metaheuristic, which can be said to be a related problem.

Metalearning may be the most ambitious but also the most rewarding goal of machine learning. There are few limits to what a good meta-learner will learn. Where appropriate it will learn to learn by analogy, by chunking, by planning, by subgoal generation, etc.

OpenAI’s Virtual Wrestling Bots

OpenAI, a firm backed by Elon Musk, has currently revealed one of it’s latest developments in the fields of Machine Learning, demonstrated using the technology of virtual sumo wrestlers.


These are the bots inside the virtual world of RoboSumo controlled my machine learning. They (The Bots) taught themselves through trial and error using Reinforcement Learning, a technique inspired by the way animals learn through feedback. It has proved useful for training computers to play games and to control robots. The virtual wrestlers might look slightly ridiculous, but they are using a very clever approach to learning in a fast-changing environment while dealing with an opponent. This game and it’s virtual world were created at OpenAI to show how forcing AI systems to compete can spur them to become more intelligent.

However, one of the disadvantages of reinforcement learning is that doesn’t work well in realistic situations, or where things are more dynamic. OpenAI devised a solution to this problem by creating its own reinforcement algorithm called proximal policy optimization (PPO), which is especially well suited to changing environments.

The latest work, done in collaboration with researchers from Carnegie Mellon University and UC Berkeley, demonstrates a way for AI agents to apply what the researchers call a “meta-learning” framework. This means the agents can take what they have already learned and apply it to a new situation.

Inside the RoboSumo environment (see video above), the agents started out behaving randomly. Through thousands of iterations of trial and error, they gradually developed the ability to move—and, eventually, to fight. Through further iterations, the wrestlers developed the ability to avoid each other, and even to question their own actions. This learning happened on the fly, with the agents adapting even they wrestled each other.

Flexible learning is a very important part of human intelligence, and it will be crucial if machines are going to become capable of performing anything other than very narrow tasks in the real world. This kind of learning is very difficult to implement in machines, and the latest work is a small but significant step in that direction.


(sources: MitTechReview, OpenAI Blog, Wired)


The race for autonomy in cars is ubiquitous. Top car brands are working in providing complete autonomous vehicles to their customers and the future with self-driving vehicles is inevitable. Adding the cherry to the cake, Nvidia’s recently announced chip is the latest generation of its DrivePX onboard car computers called Pegasus. The device is 13 times faster than the previous iteration, which has so far been used by the likes of Audi, Tesla, and Volvo to provide semi-autonomous driving capabilities in their vehicles.


Nvidia Pegasus

At the heart of this semiconductor is the mind-boggling technology of Deep Learning. “In the old world, the more powerful your engine, the smoother your ride will be,” Huang said during the announcement. “In the future, the more computational performance you have, the smoother your ride will be.”

Nvidia asserts that the device is only about the size of a license plate. But it has enough power to process data from up to 16 sensors, detect objects, find the car’s place in the world, plan a path, and control the vehicles itself. Oh, and it will also update centrally stored high-definition maps at the same time—all with some resources to spare.

The new system is designed to eventually handle up to 320 trillion operations per second (TOPS), compared to the 24tn of today’s technology. That would give it the power needed to process the masses of data produced by a vehicle’s cameras and other sensors and allow it to drive completely autonomously, Nvidia said. The first systems to be tested next year will have less processing power but will be designed to scale up with the addition of extra chips.


(sources: MitTechReview, NvidiaBlog)

Origami-inspired Robots

In a bid to augment the robots’ abilities, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have come up with a new tool: origami exoskeletons.

In a paper published recently, researchers describe four exoskeletons, each made out of a plastic sheet that folds into a predefined shape when heated for a few seconds. There’s a boat-shaped exoskeleton and a glider: one for “walking,” and another that folds up into a crude wheel for faster movement. Each exoskeleton can be donned in turn by a tiny lead bot called Primer. This isn’t a robot as we usually think of them, but a small magnetic cube that can be controlled remotely using magnetic fields.

“If we want robots to help us do things, it’s not very efficient to have a different one for each task,” said CSAIL’s Daniela Rus, the project’s lead, in a press release. “With this metamorphosis-inspired approach, we can extend the capabilities of a single robot by giving it different ‘accessories’ to use in different situations.” In the future, the researchers imagine this sort of approach to robot design could help up make multifunctional bots that can perform complex tasks remotely. They could be used for deep-sea mining operations, for example, or for building colonies in space. These are locations where you don’t want to waste resources shipping out lots of different bots for different jobs, so it’s more efficient to send one with a set of origami tools. As Rus says: “Why update a whole robot when you can just update one part of it?”

Watch the video below for getting a better idea of the origami-inspired robots.


(Source : CSAIL, TheVerge, ScienceDaily )

AI recreates game engine using less than two minutes of videogame footage

Georgia Institute of Technology researchers have developed a new approach using an artificial intelligence to learn a complete game engine, the basic software of a game that governs everything from character movement to rendering graphics.

In layman’s terms, the new system can replicate the ‘game engine,’ which dictates everything from character movement to rendering graphics, creating a cloned version that is indistinguishable from the original when played.

The Georgia Tech team’s AI can learn how a video game operates just by watching two minutes of gameplay. On right, the AI replicates Mega Man in the ‘Bomberman’ stage. There were some failures, including a point at which he disappears. The original is shown on the left.

Their AI system watches less than two minutes of gameplay video and then builds its own model of how the game operates by studying the frames and making predictions of future events, such as what path a character will choose or how enemies might react.

To get their AI agent to create an accurate predictive model that could account for all the physics of a 2D platform-style game, the team trained the AI on a single “speedrunner” video, where a player heads straight for the goal. This made “the training problem for the AI as difficult as possible.”

“Our AI creates the predictive model without ever accessing the game’s code, and makes significantly more accurate future event predictions than those of convolutional neural networks,” says Matthew Guzdial, lead researcher and Ph.D. student in computer science. “A single video won’t produce a perfect clone of the game engine, but by training the AI on just a few additional videos you get something that’s pretty close.”

They next tested how well the cloned engine would perform in actual gameplay. They employed a second AI to play the game level and ensure the game’s protagonist wouldn’t fall through solid floors or go undamaged if hit by an enemy. The results: the AI playing with the cloned engine proved indistinguishable compared to an AI playing the original game engine.

A section of gameplay video (left) is produced by the original Super Mario Bros. engine, and the cloned engine (right) demonstrates the ability to accurately predict animation states.

‘A single video won’t produce a perfect clone of the game engine, but by training the AI on just a few additional videos you get something that’s pretty close.’ The game engine created with their system was more similar to the original than the same test done on a neural network, according to the researchers.



(source: GaTech blog, dailymail)