Model Representation

Referencing back to the housing prices and size example. We can fit a model to represent the data which will allow us to predict housing prices based on the size. In this case, we fit a straight line across the data. So a 1250 square feet house will be predicted to sell for 220k USD. Again, this is a supervised learning example. We are given data that has the correct answer. This is also a regression problem since we need to predict real-valued output, which is the price. The other common type of supervised learning problem is the classification problem.

To establish notation for future use, we’ll use $x^{(i)}$ to denote the “input” variables (living area in this example), also called input features, and $y^{(i)}$ to denote the “output” or target variable that we are trying to predict (price). A pair $(x^{(i)},y^{(i)})$ is called a training example, and the dataset that we’ll be using to learn — a list of m training examples $(x^{(i)},y^{(i)});i=1,...,m$ — is called a training set. Note that the superscript “(i)” in the notation is simply an index into the training set, and has nothing to do with exponentiation. We will also use X to denote the space of input values, and Y to denote the space of output values. In this example, X = Y = ℝ.

Quiz

Consider the training set shown below. $(x^{(i)},y^{(i)})$ is the $i^{th}$ training example. What is $y^{(3)}$ ?

Size in square feet (x)	Price in 1000's (y)
2104	460
1416	232
1534	315
852	178

( ) 2104
( ) 1534
(x) 315
( ) 0

To describe the supervised learning problem slightly more formally, our goal is, given a training set, to learn a function h : X → Y so that $h_(x)$ is a “good” predictor for the corresponding value of y. For historical reasons, this function h is called a hypothesis. Seen pictorially, the process is therefore like this:

In the housing price example above, we fitted a straight line through the dataset. This type of model is called linear regression. We can fit more complex models but we will stick with linear regression for now. In this particular example, we only have one variable, which is size of the house. We can call it a Univariate linear regression model. Univariate just means one variable.

When the target variable that we’re trying to predict is continuous, such as in our housing example, we call the learning problem a regression problem. When y can take on only a small number of discrete values (such as if, given the living area, we wanted to predict if a dwelling is a house or an apartment, say), we call it a classification problem.

Model representation

Model Representation

Quiz

results matching ""

No results matching ""