Building a truly intelligent machine has been the holy grail of computer science since its very beginning. One of the first ideas of how to achieve this, born not long after the dawn of artificial intelligence itself, was to simulate a human brain. Modelling the network of brain neurons has led to as many disappointments as breakthroughs over its fascinating decades-long history. Today, with massive amounts of data and computational power easily available, this old idea, rebranded as “deep learning,” is once again seducing thinkers, visionaries and practitioners all over the world, promising real artificial intelligence
By Dominika Tkaczyk, Ph.D.
Artificial neural networks (ANNs) are a comparatively old family of artificial intelligence techniques, dating back to the 1950s. They are in fact one of the very first attempts at constructing an intelligent algorithm, which would be able to learn from experience without being explicitly told how to make decisions. Some tasks intelligent algorithms typically perform are: automated language translation, text understanding and generation, converting speech to text and automated recognition of objects depicted in images.
Perhaps not surprisingly, one of the first ideas for how to make computers think was to artificially mimic the structure and behavior of a human brain. An artificial neural network is a collection of interconnected simple artificial neurons organized in layers, which is loosely analogous to the network of a biological brain’s axons. Individual neurons are basic processing units cooperating in order to achieve the final goal, such as a decision about which digit is represented by an image of a handwritten piece of text.
More precisely, whenever we want the network to label an image with a digit, a typical preprocessing step would be to represent the image as color intensities of all its pixels. Such a color matrix is passed to the first layer of artificial neurons, which performs various mathematical operations and passes their results to the next layer. This process continues until the processing reaches the last layer, which is responsible for outputting the final decision, which in this case would be one of the ten digits. To be able to make such decisions with high accuracy, the network has to be exposed to a lot of training data in advance in order to tune its parameters to a specific task. This tuning process is called learning from examples.
Machine learning vs. deep learning
Machine learning refers to the family of intelligent algorithms able to make informed, non-trivial decisions of a specific kind, such as converting an image of a handwritten letter into a sequence of words. Machine learning algorithms are able to learn from data without being explicitly told how to make these decisions. The result of the learning process is typically called “learned model” and contains various patterns observed in the learning data. The model is used later on to make similar decisions, but in new previously unseen circumstances. Deep learning is a subset of machine learning and refers to a family of approaches based on simulating the behavior of a human brain. Deep learning models loosely resemble a simplified structure of the brain, with connected artificial neurons organized in processing layers.
What really differentiates neural networks from other approaches is that they tend to work well even with very basic, raw representations of the objects (such as a simple matrix of pixel colors), while other learning algorithms typically require task-specific handcrafted features. Despite this unquestionable advantage, in the 1990s and 2000s ANNs were considered obsolete and not worthy of further research. The main reason for this was the computational cost of ANNs at the time: typically it could take weeks or months to train even a moderately large network. As a result, we were able to efficiently build only simple, small networks, which were outperformed in most tasks by simpler learning models.
A game changer
This situation changed rapidly in the 2000s. Advances in specialized hardware, such as powerful graphics processing units (GPUs), new distributed computing paradigms, as well as improvements in the efficiency of the training algorithms all contributed to speeding up training process by orders of magnitude, bringing running times of weeks back to days and allowing the networks to grow bigger than ever, up to hundreds or thousands of layers. This new take on ANN ideas combined with complicated, multilayered architectures, is called “deep learning.” The term was introduced in 2000 and has been gaining popularity since.
The real advantage of deep models composed of multiple layers is that they can learn features at various levels of abstraction. For example, if we carefully train a deep neural network to classify images, we will find out that the first layer trained itself to recognize very basic objects like edges, the next layer is able to recognize collections of edges such as shapes, the third layer trained itself to recognize collections of shapes like eyes or noses, and a further layer will learn even higher-order features like faces.
Results? A substantial increase in the quality of the decisions made by stateof- the-art intelligent algorithms. Deep learning has quickly gained an advantage over older approaches, especially in typically hard tasks involving images, sound and written language.
Analyzing pictures presents a great example of the power of deep learning. Within the past couple of years, we have observed a huge increase in the accuracy of automated object recognition in pictures. Thanks to deep learning, the error rates have decreased several times since 2011. In 2012, Google’s deep learning system that had been shown 10 million images from YouTube videos proved almost twice as good as any previous image recognition effort at identifying objects such as cats.
Another example is automated game playing. IBM’s famous Watson computer was able to win Jeopardy! against human contestants using deep learning techniques and is now being trained to help doctors make better treatment decisions. AlphaGo is a computer program developed by Google’s DeepMind to play the board game Go. In 2015 it became the first program to beat a human professional Go player without handicaps on a full-sized board.
Deep learning is heavily used by Google in its voice and image recognition algorithms, by Netflix and Amazon to decide what you want to watch or buy next, and by researchers at MIT to predict the future. In Poland, one of the first companies to use deep learning was Craftinity. They provide solutions for tasks such as sentiment analysis, which is aimed at providing insights into what customers are actually thinking, fraud identification and medical diagnoses analysis.
In response to the promises given by deep learning, all the big players invest their resources in these exciting machine learning techniques. After all, the next artificial intelligence revolution might be just around the corner. And who wouldn’t want to be in the front seat when it comes?
What can deep learning do better?
Once we realize the potential behind deep learning algorithms, the possibilities seem endless. Some have already been put into practice, others are still in the early stages of trial and error. Here are some of the most promising existing and possible uses of deep learning.
Watson is a deep learning-based question answering system capable of answering questions posed in natural language, developed in IBM’s DeepQA project. The system was specifically tuned to answer questions on the quiz show Jeopardy! In 2011, Watson competed on Jeopardy! against former winners Brad Rutter and Ken Jennings and received the first place prize of USD1 million. During the game, the system had access to 200 million pages of structured and unstructured content, including the full text of Wikipedia. Apart from “reading” and “remembering” much more information than an average human is able to, Watson’s advantage was much faster reaction to the game’s signaling device: it takes a human tenths of a second to perceive a light signal, while Watson, notified by electronic signals, could react within eight milliseconds.
Practice makes perfect
While humans are able to decide whether a picture shows a cat or a dog in an instant, this task used to be much harder for computers. In traditional machine learning algorithms, feature extraction, which means choosing the most powerful representation of something, is one of the biggest challenges. The programmer typically needs to “tell” the computer what kinds of patterns will be most helpful, so that the computer knows what to look for in the training data. For example, if you want to teach a program to read handwritten digits, a good place to start may be to explicitly instruct it to check how many “closed circles” are shown in the input image (“8” would typically have two, and “1” most likely will not contain any). Such manual feature extraction is challenging and places a huge burden on the programmer – who acts as the instructor. One of the promises of deep neural networks is that they can automatically learn features at various levels of abstraction, taking this burden off the programmer. Automated feature learning abilities have already increased the accuracy of state-of-the-art solutions to image recognition tasks.
Lost in translation
A decade ago nobody believed machines could become as effective at translating foreign languages as humans. Granted, automatic translators have been around for years and no one doubts they come in handy in translating simple phrases that every traveler needs. However, a high-quality translation of a larger text filled with linguistic and cultural references and nuances seemed an impossible feat for a computer. But thanks to deep learning it seems they might be getting there. A deep learning-based approach to machine translation has made rapid progress in recent years, and is currently used by Google’s translation services, replacing its previous statistical methods. Bad news for translators and interpreters all over the world.
Where’s the money?
A 2011 movie called “Limitless” tells the story of a struggling writer who suddenly becomes the smartest person in the world by taking a pill that allows him to use 100 percent of his brain. Before long, he turns his attention to the stock market and becomes a financial wizard overnight. Given how adept deep learning-based algorithms already are at image recognition, and games such as Go and Jeopardy!, it is no stretch to believe they could one day learn all there is to know about the world of finance. Who knows, maybe the next Warren Buffet will be a bunch of ones and zeros?
What’s wrong with me, doc?
If you are a fan of House M.D. you know how difficult finding the right diagnosis can be and how much thought process it requires. Perhaps intelligent machines equipped with deep learning techniques will become the salvation for the already overburdened health service. Of course, there is a plethora of issues that need to be solved before anyone entrusts the wellbeing of a patient to a machine. Malpractice insurance, for instance, might get tricky. But having a highly trained specialist available 24/7 who knows your entire history and responds instantly could one day outweigh the cons.
There’s nothing weird about us
In a 2013 movie called “Her,” people widely use automated personal assistants for handling communication with other humans and interactivity with the world in general. Those computer programs were so close to humans in their behavior that it was easy to develop a personal relationship with them and even fall in love. With deep learning’s advances in speech understanding and generation, as well as automated question answering, we might be getting closer to a scenario in which you pick up your phone to talk to your best friend without the need to dial out.