A school for machines

Image : shutterstock

Nowadays, machine learning seems to be the go-to solution for all non-trivial problems in the digital world. You want to target a new customer segment with an online ad campaign? You need to forecast demand for a new product? Or maybe you are looking for a way to increase sales by providing personalized recommendations to your customers? Just consult some machine-learning wizards and you will be magically relieved of all problems. But how does this mysterious black box called machine learning really work?

By Dominika Tkaczyk

Researchers all over the world rarely agree on a single definition of machine learning. Probably the best known is a quote from Arthur Samuel, an American pioneer in the field of artificial intelligence: “Machine learning gives computers the ability to learn without being explicitly programmed.” It loosely means that whenever we need a computer program to make “smart” decisions of any kind, instead of giving it handcrafted, specific instructions about how to make them, a much better approach is to let it learn automatically by observing the outcomes of many decisions.

Samuel applied this principle in his research in the field of computer gaming. In the 1950s he developed a way for a computer to learn playing checkers by playing tens of thousands of games against itself and observing which board configurations tend to lead to victory. Equipped with this experience, the algorithm was able to choose individual moves with the overall goal in mind and was moderately successful in playing checkers against humans.

Learning from examples

Learning from data or experience is the core principle in machine learning. It is in fact the only way for a computer program to make complex decisions efficiently and accurately enough, without a lot of manual work provided by humans.

robot 2For example, let us assume we want to write a computer program that is able to categorize news articles into topics. In other words, given any article written in natural language, our program should decide (without any help from a human) whether the article is about politics, sports, music, science, etc. This is a relatively easy task for a human, provided that they are able to read and understand the article. But what if the article is written in a language we are not familiar with? If we do not understand the words, or even the names of people or locations mentioned, the problem becomes practically impossible to solve. A computer will suffer from exactly the same issue: it perceives an article as a sequence of letters and words, at the same time being completely ignorant of their meaning. So how can our program make smart decisions about the article’s topic?

In a lot of applications the answer that is correct 80 percent of the time is still useful, while an answer given too late is simply no answer at all.

One way to solve this would be to equip our program with specific, handcrafted rules for decision making, such as: “if the article contains at least two of the following three words: ‘football,’ ‘match,’ ‘player,’ it is about sports.” However, we would most likely need hundreds of rules per category for the program to work well, not to mention separate rules for every language we want to be able to process. The cost of employing experts to carefully develop such high-quality rules would be substantial.

Is there a better approach? As is often the case, machine learning will come to our aid. Instead of carefully devising thousands of specific rules by hand, we could expose our program to a lot of examples of specific articles with preassigned categories, and let it look for useful patterns in the data. By analyzing the correlations between individual words and topics, our program might notice, for example, that whenever the word “football” appears in the text, 95 percent of the time the article is about sports (the other 5 percent might be articles like this one, in which “football” appears only as an example). This observation could be used for the actual topic prediction: if a new, previously unseen article contains the word “football,” there is a high probability it is about sports. Even though these rules seem similar to the previous solution, in this case they are automatically learned by the computer using data, and without any manual intervention, which considerably lowers the overall cost.

We typically refer to the data used for learning as the “training set,” and the process of looking for patterns in the data is called “learning from examples.” It is worth noticing that humans learn similarly in many cases. For example, we do not teach infants to speak by lecturing them about grammar rules that can be used to construct sentences. Instead, we simply talk to them, giving their brains huge amounts of data to learn from. As a result, most adults are able to construct new sentences correctly, even if they never learned any formal grammar at all.

Machine learning flavors

Assigning predefined categories to objects based on a training set is in fact not the only type of machine learning out there, though it is probably the most well-known. It is an example of what we call “supervised learning.” In this type of learning, the computer is presented with examples, which are certain objects (for example articles or images) along with a “label” assigned to them (for example the article’s topic, or the name of the animal depicted in the image). The goal of supervised learning is to learn a general rule relating the characteristics of the objects to their labels.

 A poorly chosen learning algorithm, not powerful enough to pick up complex relationships in the data, will give poor results.

Some classic examples of supervised machine learning applications include: optical character recognition, in which we want to categorize images of handwritten characters by the letters represented by them; spam filtering, the goal of which is to identify email messages as spam or non-spam; medical diagnosis, for which we wish to diagnose a patient as a sufferer or non-sufferer of a particular disease; and weather prediction, which aims to predict, for instance, the air temperature tomorrow.

Another well-known type of machine learning is “unsupervised learning,” in which we only have the example objects, but no correct “label” (or “decision”) is available, and the goal is to find interesting and useful patterns in data. For example, unsupervised learning can be used for customer segmentation, a case in which we wish to divide our customers into several groups based on their characteristics and behavior, such that similar people are grouped together. Another example of unsupervised learning is fraud detection, used to examine a data set of credit card transactions in order to find transactions that may be fraudulent in nature.

Other types of machine learning include, “reinforcement learning,” where a computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle, or playing a game against an opponent); and “recommender systems,” in which we propose recommendations for a user based on their interests or opinions of people similar to them.

robot 3

Teaching strategies

Building a highly accurate machine learning-based solution for a certain problem is not a trivial task and requires a substantial amount of expertise. The process typically starts with collecting data. In the case of categorizing news articles, a set of example articles with known topics has to be collected. Next, the examples have to be transformed into a representation better understood by a computer, typically using numbers. For example, an article might be represented by the counts of individual words, and an image might be represented by the color intensities of all the pixels. Often, more complex data transformation is needed, for example we might decide that words differing only in grammatical form (“playing” and “player”) or synonyms (“football” and “soccer”) should be represented by a single word.

The next phase is the actual learning, during which the program examines the data and tries to discover useful patterns. The patterns found in the data represent the knowledge the program was able to infer, and are usually referred to as “the learned model.” The model is used later on for the actual prediction, when the program is asked, for example, to assign the topics to new, previously unseen articles.

The exact approach used by the computer to learn from the data is called a “learning algorithm.” Over the decades, hundreds of different learning algorithms have been developed by researchers and machine learning practitioners. For example, one way to learn is to build a graph of questions asked about a new article we wish to categorize. In such a graph, an answer to a question determines which question is asked next, and at the end we arrive at the predicted category.

This is traditionally called “a decision tree,” and is very popular in self-diagnosis schemes, in which we are asked a series of questions about the symptoms we suffer from in order to obtain a probable diagnosis. Another way of learning would be to analyze the probabilities of various words appearing in articles of different classes; this is called the “naive Bayes approach.” Finally, the training set can be used in a very direct way: since we know the correct categories of the training documents, for a new article we could try to simply assign the topics of the most similar article from the training set. This approach is known as the “nearest neighbors algorithm.”

A crucial issue in building a machine learning-based solution is evaluation, which aims to assess the quality of the algorithm by comparing the decisions made by a trained model to correct answers, typically provided by a human. Evaluation serves multiple purposes: it can be used to report the expected quality of a machine learning-based tool, to compare different training methods, or to choose the learning method that should be used in production.

Not always straight-A students

It is important to note that in real world problems a machine learning-based tool will not be correct every single time, just as we don’t expect any human student to always know the correct answer. What are the factors affecting the quality of machine learning algorithms?

 “Machine learning gives computers the ability to learn without being explicitly programmed.” 

First of all, performance depends on the learning method used. Different learning methods have very different characteristics and make different assumptions about the data. For a specific problem, some learning methods might be more suitable than others, and it is in fact impossible to know in advance what will work best. A poorly chosen learning algorithm, not powerful enough to pick up complex relationships in the data, will give poor results.

By far the most important factor is the data the program learns from. In general, real life examples always contain some random fluctuations (typically called “noise”), which will prevent our machine learning-based program being correct 100 percent of the time. But even if we ignore the issue of noise, there are other important aspects to consider related to the data quality.

First of all, to learn well, the computer needs to see enough examples, especially since the categories themselves exhibit some amount of diversity. Indeed, after seeing only one article about a football world cup, it is unlikely the program will correctly categorize a new article about judo lessons for children. Moreover, the data has to carry enough information about the categories we are trying to predict. For example, if we do not have the articles themselves, but only know their length and language, it would be impossible to predict the topics, as there is very little correlation between these features and the categories. In this case even huge amounts of data will not help.

Finally, the reason for bad performance might be the difficulty of the problem itself. Some problems we try to solve using machine learning are not easy, even for a human, in other cases there might be very little consensus between the experts on what the “right” decision is. An example of such a task is assessing the polarity of the text, that is deciding whether it expresses bad, good or neutral opinions of its author.

So why do we bother building machine learning-based solutions, knowing they will inevitably make errors? With the overwhelming volume of data surrounding us, machine learning is often the only way to efficiently process it. The decisions provided by a computer might be of significantly lower quality than those supplied by humans, but they are incredibly fast and cheap. In a lot of applications the answer that is correct 80 percent of the time is still useful, while an answer given too late is simply no answer at all.

Pin It