Logistic Regression is for classification problem, and the predication value is fixed descrete values, such as 1 for positive or 0 for negative. The essence of logistic regression is:
- hypothesis function is sigmoid function
- cost function: J(theta)
- gradient descent and algorithms
- advantanced optimization with regularization to solve overfitting problem.
where htheta(x) = theta’ x(theta’ is transpose theta)
htheta(x) mean *Probalitiy that y=1, given x parameterized by theta P(y=1 | x; theta),12if htheta(x) >= 0.5, then y = 1if htheta(x) < 0.5, then y = 0
Our goal is the calculate theta, can classify our traing data with descision boundary.
In the example, the traning data can be classified into 2 categories by a straight line.
For the assignment of week3, predicate the adimission by university with 2 exams grade data.
I optimize the implementation with vectoriaztion
Regularzation is for overfitting problem.
- underfit: not fit the training data, with high bias between predications and actual value
- Just Right: great fit
- Overfitting: often with too many features, not so much traning data, fit traing data well, but with hight variance, predict new data not very well
the lambda for regularization can’t be too large:
- large lamba will got very small theta value, and underfit.
- small lambda will got large theta velue, and overfit.
- the lambda for the exerise is 1
After one year, I learn the logistic regression again. Last week, Andrew NG left Baidu. Maybe, these great people thought Baidu is not worth to fight for. Now I still decidated on a Spark project and focus on Spark Streaming. As team leader, I am bearing a great burden and is stressful. It’s a great chance to train my leadership. I am also wondering next opportunity. Learning Machine Learning is right and worth to do. Anyway, even though mist is on the path, just go forward and fight~