Cost Function
Using the housing price example again, we remember that our hypothesis function will predict the price of the house (y) based on the size in squared feet (x). Therefore our hypothesis, a linear regression equation in this case, is:
hθ(x)=θ0+θ1x
θ0,θ1 are parameters that will determine how our hypothesis look like.
In the picture below, you see that different θ0,θ1 values produce a different hypothesis. When θ0=1.5,θ1=0, we get a horizontal line that crosses the (0,1.5) point. When θ0=0,θ1=0.5 we get a straight line that passes through (0,0) and (2,1) with a gradient of 0.5. When θ0=1,θ1=0.5, we get a straight line that passes through (0,1) and (2,2) with a gradient of 0.5.
Quiz
Consider the plot below of hθ(x)=θ0+θ1x. What is θ0 and θ1?
- ( ) θ0=0,θ1=1
- (x) θ0=0.5,θ1=1
- ( ) θ0=1,θ1=0.5
- ( ) θ0=1,θ1=1
We can measure the accuracy of our hypothesis function by using a cost function. This takes an average difference (actually a fancier version of an average) of all the results of the hypothesis with inputs from x's and the actual output y's.
J(θ0,θ1)=12m∑mi=1(ˆyi−yi)2=12m∑mi=1(hθ(xi)−yi)2
To break it apart, it is 12ˉx where ˉx is the mean of the squares of hθ(xi)−yi , or the difference between the predicted value and the actual value.
This function is otherwise called the "Squared error function", or "Mean squared error". The mean is halved \left( frac 12 \rigth) as a convenience for the computation of the gradient descent, as the derivative term of the square function will cancel out the frac12 term. The following image summarizes what the cost function does: