Supervised Learning

In supervised learning, we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.

Supervised learning problems are categorized into "regression" and "classification" problems. In a regression problem, we are trying to predict results within a continuous output, meaning that we are trying to map input variables to some continuous function. In a classification problem, we are instead trying to predict results in a discrete output. In other words, we are trying to map input variables into discrete categories.

Example 1:

Given data about the size of houses on the real estate market, try to predict their price. Price as a function of size is a continuous output, so this is a regression problem.

supervised learning on house prices

We could turn this example into a classification problem by instead making our output about whether the house "sells for more or less than the asking price." Here we are classifying the houses based on price into two discrete categories.

Example 2:

(a) Regression - Given a picture of a person, we have to predict their age on the basis of the given picture

(b) Classification - Given a patient with a tumor, we have to predict whether the tumor is malignant or benign.

supervised learning on breast cancer

In the supervised learning examples above, we used 1 feature to form predictions. For the housing price example, we use size in square feet. For the breast cancer example, we use tumor size. However, we might want to use more than 1 feature in our supervised learning model. Referring to the breast cancer example, we can use both age of patient and tumor size as features. As we draw our data points, benign tumors are presented as circles, and malignant tumors are presented as crosses. In the following chart, you can see the trend that as age of patient gets higher and the tumor size gets larger, it is more likely the tumor is malignant.

supervised learning on breast cancer with 2 features

It is very common to use hundreds of features in the models and sometimes infinite numbers of features. It may seem impossible but actually there is a mathematical trick that allows the computer to deal with an infinite number of features. This algorithm is called the Support Vector Machine.

Quiz


You’re running a company, and you want to develop learning algorithms to address each of two problems. Problem 1:You have a large inventory of identical items. You want to predict how many of these items will sell over the next 3 months.

Problem 2: You’d like software to examine individual customer accounts, and for each account decide if it has been hacked/compromised. Should you treat these as classification or as regression problems?

  • ( ) Treat both as classification problems.
  • ( ) Treat problem 1 as a classification problem, problem 2 as a regression problem.
  • (x) Treat problem 1 as a regression problem, problem 2 as a classification problem.
  • ( ) Treat both as regression problems.

results matching ""

    No results matching ""