Smelling illness in human breath using AI

Researchers from Loughborough University, Western General Hospital, the University of Edinburgh, and the Edinburgh Cancer Centre in the United Kingdom, recently developed a deep learning-based method that can analyze compounds in the human breath and detect illnesses, including cancer, with better than-human average performance.

The sense of smell is used by animals and even plants to identify hundreds of different substances that float in the air. But compared to that of other animals, the human sense of smell is far less developed and certainly not used to carry out daily activities. For this reason, humans aren’t particularly aware of the richness of information that can be transmitted through the air and can be perceived by a highly sensitive olfactory system. AI may be about to change that.

Using NVIDIA Tesla GPUs and the cuDNN-accelerated Keras, and TensorFlow deep learning frameworks, the team trained their neural network on data from participants with different types of cancer receiving radiotherapy, said researcher Angelika Skarysz, a Ph.D. research student at Loughborough University. To increase the neural network’s efficiency, the team increased the original training data by using data augmentation. The convolutional neural network was augmented 100 times, the team said.



“A team of doctors, nurses, radiographers and medical physicists at the Edinburgh Cancer Centre collected breath samples from participants undergoing cancer treatment. The samples were then analyzed by two teams of chemists and computer scientists. Once a number of compounds were identified manually by the chemists, fast computers were given the data to train deep learning networks. The computation was accelerated by special devices, called GPUs, that can process multiple different pieces of information at the same time. The deep learning networks learned more and more from each breath sample until they could recognize specific patterns that revealed specific compounds in the breath” as posted by Andrea Soltoggio on The Conversation.


3D view of a portion of breath sample data

“Computers equipped with this technology only take minutes to autonomously analyze a breath sample that previously took hours by a human expert,” Soltoggio said. AI is making the whole process cheaper – but above all, it is making it more reliable.

Link to the paper: ResearchGate

(Via: ResearchGate, The Conversation, NVidia)


Using AI to see through walls

MIT’s new AI can now see through walls. MIT has given a computer x-ray vision, but it didn’t need x-rays to do it. The system, known as RF-Pose, uses a neural network and radio signals to track people through an environment and generate wireframe models in real time. It doesn’t even need to have a direct line of sight to know how someone is walking, sitting, or waving their arms on the other side of a wall.


Using wireless signals, RF-Pose could serve as a healthcare system to monitor patients’ movements from the other side of a wall. Source: MIT CSAIL

The MIT researchers decided to collect examples of people walking with both wireless signal pings and cameras. The camera footage was processed to generate stick figures in place of the people, and the team matched that data up with the radio waves. That combined data is what researchers used to train the neural network. With a strong association between the stick figures and RF data, the system is able to create stick figures based on radio wave reflections. One idea is to use the system to monitor those who might be at risk of a fall—a sick or elderly person, say.

Here’s the video which might help in comprehending the topic more.

Sources: MitTechReview, Extremetech, Popular Science

AI to detect Wrist Fractures

Imagen’s OsteoDetect, an AI-based diagnostic tool that can quickly detect distal radius wrist fractures. Its machine learning algorithm studies 2D X-rays for the telltale signs of fractures and marks them for closer study. It’s not a replacement for doctors or clinicians, the FDA stressed — rather, it’s to improve their detection and get the right treatment that much sooner.

OsteoDetect uses AI software to analyze two-dimensional x-ray images for distal radius fracture. The software, intended as an adjunct to clinician judgment, marks the fracture location on posterior-anterior and medial-lateral x-ray images. The company submitted a retrospective study of 1,000 radiograph images that assessed the independent performance of the image analysis algorithm for detecting wrist fractures and the accuracy of the fracture localization of OsteoDetect against the performance of three board certified orthopedic hand surgeons. Imagen also submitted a retrospective study of 24 providers who reviewed 200 patient cases. Both studies demonstrated that the readers’ performance in detecting wrist fractures was improved using the software, including increased sensitivity, specificity, positive and negative predictive values, when aided by OsteoDetect, as compared with their unaided performance according to standard clinical practice.

Inside the mind of a Neural Network

Neural networks, which learn to perform tasks such as including speech-recognition, automatic-translation systems, image recognition systems and time series prediction by analyzing huge sets of training data, have been responsible for the most impressive recent advances in artificial intelligence.

During training, however, a neural net continually adjusts its internal settings in ways that even its creators can’t interpret. Much recent work in computer science has focused on clever techniques for determining just how neural nets do what they do. Neural nets are so named because they roughly approximate the structure of the human brain. Typically, they’re arranged into layers, and each layer consists of many simple processing units — nodes — each of which is connected to several nodes in the layers above and below. Data are fed into the lowest layer, whose nodes process it and pass it to the next layer. The connections between layers have different “weights,” which determine how much the output of any one node figures into the calculation performed by the next.

In the case of the speech recognition network, Belinkov and Glass used individual layers’ outputs to train a system to identify “phones,” distinct phonetic units particular to a spoken language. The “t” sounds in the words “tea,” “tree,” and “but,” for instance, might be classified as separate phones, but a speech recognition system has to transcribe all of them using the letter “t.” And indeed, Belinkov and Glass found that lower levels of the network were better at recognizing phones than higher levels, where, presumably, the distinction is less important [MIT].

An important part of the neural network is forgetting. An ANN must know, which part to forget and which part not to. The basic algorithm used in the majority of deep-learning procedures to tweak neural connections in response to data is called “stochastic gradient descent”: Each time the training data are fed into the network, a cascade of firing activity sweeps upward through the layers of artificial neurons. When the signal reaches the top layer, the final firing pattern can be compared to the correct label for the image—1 or 0, “dog” or “no dog.” Any differences between this firing pattern and the correct pattern are “back-propagated” down the layers, meaning that, like a teacher correcting an exam, the algorithm strengthens or weakens each connection to make the network layer better at producing the correct output signal. Over the course of training, common patterns in the training data become reflected in the strengths of the connections, and the network becomes expert at correctly labeling the data, such as by recognizing a dog, a word, or a 1.



sources: MIT blog, Wired, arxiv

AI creates Cartoons from Text

Animation as well know is a great source of entertainment and is difficult to create and stitch them together. It is now possible to stitch cartoons with the help of AI. The researchers at the Allen Institute of Technology and the University of Illinois, have managed to develop an AI called Craft (Composition, Retrieval, and Fusion Network), which does some part of the tedious work. The AI creates scenes from text descriptions. The AI system, Craft was able to recreate the Flintstones cartoon, by being trained on 25000 3 second clips and a brief text description of that scene.

The AI works by matching the videos to the brief word-descriptions and builds a set of parameters. Craft can convert the provided text descriptions into video clips and animated series. It can not only put the characters into place but also parse objects, retrieve the background, etc.


The results are still raw in nature, and there is a lot of scope for improving the model and training it on other cartoons. The video below briefly describes how the AI works.

For detailed information, refer the paper.

SqueezeNext – Hardware Aware Neural Network Design

Berkeley researchers have published ‘SqueeseNext’, the successor to SqueezeNet, in their latest attempt to distill the capabilities of very large neural networks into smaller models that can feasibly be deployed on devices with small memory and compute capabilities, like mobile phones. One of the main barriers for deploying neural networks on embedded systems has been large memory and power consumption of existing neural networks. While much of the research into AI systems today is based around getting state-of-the-art results on specific datasets, SqueezeNext is part of a parallel track focused on making systems deployable.

Need for SqueezeNext?

The transition to Deep Neural Network based solutions started with AlexNet, which won the ImageNet challenge by a large margin. The ImageNet classification challenge started in 2010 with the first winning method achieving an error rate of 28.2%, followed by 26.2% in 2011. However, a clear improvement in accuracy was achieved by AlexNet with an error rate of 16.4%, a 10% margin with the runner-up. AlexNet consists of five convolutional, and three fully connected layers. The network contains a total of 61 million parameters. Due to this large size of the network, the original model had to be trained on two GPUs with a model parallel approach, where the filters were distributed to these GPUs. Moreover, dropout was required to avoid overfitting using such a large model size. These model sizes have millions of parameters and are not suitable for real-time applications!

SqueezeNet is a successful example which achieves AlexNet’s accuracy with 50× fewer parameters without compression and 500× smaller with deep compression. SqueezeNext is efficient because of a few design strategies: low-rank filters; a bottleneck filter to constrain the parameter count of the network; using a single fully connected layer following a bottleneck; weight and output stationary; and co-designing the network in tandem with a hardware simulator to maximize hardware usage efficiency.

The resulting SqueezeNext network is a neural network with 112X fewer model parameters than those found in AlexNet. They also develop a version of the network whose performance approaches that of VGG-19 (which did well in ImageNet 2014). The researchers also design an even more efficient network by carefully tuning model design in parallel with a hardware simulator, ultimately designing a model that is significantly faster and more energy efficient than a widely used compressed network called SqueezeNet.


(via: SquezeeNext)


Deep Reinforcement Learning and Q-Learning

Deep Learning  (neural networks) are used to achieve the state of the result for image recognition, computer vision, machine translation, etc,. Reinforcement learning refers to goal-oriented algorithms, which learn how to attain a complex objective (goal) or maximize along a particular dimension over many steps; for example, maximize the points won in a game over many moves. They can start with a blank slate, and under the right conditions, they achieve superhuman performance. Like a child incentivized by spankings and candy, these algorithms are penalized when they make the wrong decisions and rewarded when they make the right ones – this is reinforcement.

Like a human, agents learn for themselves to achieve successful strategies that lead to the greatest long-term rewards. This paradigm of learning by trial-and-error, solely from rewards or punishments, is known as reinforcement learning (RL). Reinforcement algorithms that incorporate deep learning can beat world champions at the game of Go as well as human experts playing numerous Atari video games. While that may sound trivial, it’s a vast improvement over their previous accomplishments, and the state of the art is progressing rapidly.

Reinforcement learning solves the difficult problem of correlating immediate actions with the delayed returns they produce. Like humans, reinforcement learning algorithms sometimes have to wait a while to see the fruit of their decisions. They operate in a delayed return environment, where it can be difficult to understand which action leads to which outcome over many time steps.

Q – Learning

Q-learning is a model-free reinforcement learning technique. It is able to compare the expected utility of the available actions (for a given state) without requiring a model of the environment. Additionally, Q-learning can handle problems with stochastic transitions and rewards, without requiring adaptations. It has been proven that for any finite Markov decision process (MDP), Markov decision processes (MDPs) provide a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying a wide range of optimization problems solved via dynamic programming and reinforcement learning. Q-learning eventually finds an optimal policy, in the sense that the expected value of the total reward return over all successive steps, starting from the current state, is the maximum achievable.


(via Wiki, GoogleDeepmind Blog, DeepLearning4j, IntelAI)

Deep learning improves your computer with age

The researchers at Google have devised a new technique that could let a laptop or smartphone learn to do things better and faster over time. The researchers published a paper which focuses on a common problem. The prefetching problem. Computers process information much faster than they can pull it from memory to be processed. To avoid bottlenecks, they try to predict which information is likely to be needed and pull it in advance. As computers get more powerful, this prediction becomes progressively harder.

In the paper published, Google focuses on using deep learning to improve prefetching. “The work that we did is only the tip of the iceberg,” says Heiner Litz of the University of California, Santa Cruz, a visiting researcher on the project. Litz believes it should be possible to apply machine learning to every part of a computer, from the low-level operating system to the software that users interact with.

Such advances would be opportune. Moore’s Law is finally slowing down, and the fundamental design of computer chips hasn’t changed much in recent years. Tim Kraska, an associate professor at MIT who is also exploring how machine learning can make computers work better, says the approach could be useful for high-level algorithms, too. A database might automatically learn how to handle financial data as opposed to social-network data, for instance. Or an application could teach itself to respond to a particular user’s habits more effectively.

Paper reference:


(via: MitTechReview)

Baidu Research’s New AI Algorithm Mimics Voice With Very Few Samples

AI typically needs a plethora of data and a lot of time for something like voice cloning. It needs to listen to hours of recordings. However, a new process could get that down to one minute. Baidu researchers have unveiled an upgraded version of Deep Voice, their text-to-speech synthesis system, that can now, once trained, clone any voice after listening to a few snippets of audio. This capability was enabled by learning shared and discriminative information from speakers. Baidu calls this ‘Voice Cloning’. Voice cloning is expected to have significant applications in the direction of personalization in human-machine interfaces.


Here, Baidu focuses on two fundamental approaches (refer above figure):

  1. Speaker AdaptionSpeaker adaptation is based on fine-tuning a multi-speaker generative model with a few cloning samples, by using backpropagation-based optimization. Adaptation can be applied to the whole model or only the low-dimensional speaker embeddings. The latter enables a much lower number of parameters to represent each speaker, albeit it yields a longer cloning time and a lower audio quality.
  2. Speaker EncodingSpeaker encoding is based on training a separate model to directly infer a new speaker embedding from cloning audios that will ultimately be used with a multi-speaker generative model. The speaker encoding model has time-and-frequency-domain processing blocks to retrieve speaker identity information from each audio sample, and attention blocks to combine them in an optimal way.

For detailed information and mathematical explanations, refer the paper by Baidu Research.

However, this technology can also possibly have a downside as this could be tumultuous to people relying upon biometric voice security.

( via MitTechReview, Wiki, BaiduResearch)

Interpretable Machine Learning Through Teaching – (OpenAI)

The researchers at OpenAI have designed a method that encourages AIs to teach each other with examples that are cogent to human beings as well. Their method automatically selects the most informative examples for teaching a concept, for example, the best images to describe the concept of dogs, and this approach proved to be effective for both humans as well as AIs.

OpenAI envisions that some of the most impactful applications of AI will come from a result of collaboration between humans and machines. However, communication between the two is the barrier. Consider an example. Think about trying to guess the shape of a rectangle when you’re only shown a collection of random points inside that rectangle: it’s much faster to figure out the correct dimensions of the rectangle when you’re given points at the corners of the rectangle instead. OpenAI’s machine learning approach works as a cooperative game played between two agents, one the teacher and another the student. The goal here for the student is to guess a particular concept (i.e. “dog”, “zebra”) based on examples of that concept (such as images of dogs), and the goal of the teacher is to learn to select the most illustrative examples for the student.

In their two-stage technique: 

  1. A ‘student’ neural network is given randomly selected input examples of concepts and is trained from those examples using traditional supervised learning methods to guess the correct concept labels.
  2. The ‘teacher’ network — which has an intended concept to teach and access to labels linking concepts to examples — to test different examples on the student and see which concept labels the student assigns them, eventually converging on the smallest set of examples it needs to give to let the student guess the intended concept.

However, if they train the student and the teacher jointly, the student and teacher can collude to communicate via arbitrary examples that do not make sense to humans, digressing from the main goal.


(via: OpenAI)