Model Representation

Referencing back to the housing prices and size example. We can fit a model to represent the data which will allow us to predict housing prices based on the size. In this case, we fit a straight line across the data. So a 1250 square feet house will be predicted to sell for 220k USD. Again, this is a supervised learning example. We are given data that has the correct answer. This is also a regression problem since we need to predict real-valued output, which is the price. The other common type of supervised learning problem is the classification problem.

model

To establish notation for future use, we’ll use to denote the “input” variables (living area in this example), also called input features, and to denote the “output” or target variable that we are trying to predict (price). A pair is called a training example, and the dataset that we’ll be using to learn — a list of m training examples — is called a training set. Note that the superscript “(i)” in the notation is simply an index into the training set, and has nothing to do with exponentiation. We will also use X to denote the space of input values, and Y to denote the space of output values. In this example, X = Y = ℝ.

notations

Quiz


Consider the training set shown below. is the training example. What is ?

Size in square feet (x) Price in 1000's (y)
2104 460
1416 232
1534 315
852 178
  • ( ) 2104
  • ( ) 1534
  • (x) 315
  • ( ) 0

To describe the supervised learning problem slightly more formally, our goal is, given a training set, to learn a function h : X → Y so that is a “good” predictor for the corresponding value of y. For historical reasons, this function h is called a hypothesis. Seen pictorially, the process is therefore like this:

creating a hypothesis

In the housing price example above, we fitted a straight line through the dataset. This type of model is called linear regression. We can fit more complex models but we will stick with linear regression for now. In this particular example, we only have one variable, which is size of the house. We can call it a Univariate linear regression model. Univariate just means one variable.

linear regression representation

When the target variable that we’re trying to predict is continuous, such as in our housing example, we call the learning problem a regression problem. When y can take on only a small number of discrete values (such as if, given the living area, we wanted to predict if a dwelling is a house or an apartment, say), we call it a classification problem.

results matching ""

    No results matching ""