Logistic Regression 逻辑回归
solve binary classification 解决二分类问题
sigmoid function / logistic function:
g(z)=1+e−z1,0<g(z)<1
z=w⋅x+b
logistic regression model:
fw,b(x)=g(w⋅x+b)=1+e−(w⋅x+b)1
means: probability that y = 1, namely $ P(y=1|x;\vec{w},b)$
Cost Function
Logistic loss function:
L(fw,b(x(i)),y(i))={−log2(fw,b(x(i))),−log2(1−fw,b(x(i))),if y(i)=1if y(i)=0
The further prediction fw,b(x(i)) is from target y(i), the higher the loss:预测值与实际值相距越大,损失越大。
Cost Function:
J(w,b)=m1i=1∑mL(fw,b(x(i)),y(i))
This cost function is convex(凸的),从而保证参数一定能收敛到使J(w,b)最小的值
Simplified Cost Function
Simplified loss function 上述损失函数写成一行:
L(fw,b(x(i)),y(i))=−y(i)log2(fw,b(x(i)))−(1−y(i))log2(1−fw,b(x(i)))
Simplified Cost Function:
J(w,b)=m1i=1∑mL(fw,b(x(i)),y(i))=−m1i=1∑m[y(i)log2(fw,b(x(i)))+(1−y(i))log2(1−fw,b(x(i)))]
Gradient Descent
repeat until convergence:{wj=wj−α∂wj∂J(w,b)b=b−α∂b∂J(w,b)}
∂wj∂J(w,b)∂b∂J(w,b)=m1i=0∑m−1(fw,b(x(i))−y(i))xj(i)=m1i=0∑m−1(fw,b(x(i))−y(i))
repeat until convergence:{wj=wj−α[m1i=0∑m−1(fw,b(x(i))−y(i))xj(i)]b=b−α[m1i=0∑m−1(fw,b(x(i))−y(i))]}
注意,在形式上,所得到的Gradient Descent Algorithm与Linear Regression完全一致
但 二者对于fw,b(x)的定义完全不同,如下所示,
Linear Regression: fw,b(x)=w⋅x+b
Logistic Regression: fw,b(x)=1+e(−w⋅x+b)1