Generalization 泛化

means the model can make good predictions even on brand new examples that it has never seen before.

Under-fitting(high bias)

means the model does not fit the training set well, which is also called high bias.

Addressing Under-fitting

  • Add more features as input

  • Redesign a more complex model

Over-fitting(high-variance,高方差)

means the model fits the training set too well to generalize to new examples that’s never seen before.

Addressing Over-fitting

  1. Collect more training data
  2. Feature Selection: Select features to include/exclude:
    all features + insufficient data ==> over-fit
  3. Regularization 正则化: reduce size of parameters

Regularization

希望取得一个较为平滑的曲线,但又不能过于平滑。

在特征参数过多时,往往不知道哪些重要,哪些需要惩罚,因此,正则化时,选择惩罚所有参数, 通过为Cost Function 加上下面这一项regularization term.(关注曲线平滑程度,故不需要考虑bb)

λ2mj=1nwj2\frac{\lambda}{2m} \sum_{j=1}^nw_j^2

J(w,b)=12mi=1m(fw,b(x(i)y(i))2  +λ2mj=1nwj2J(\vec{w},b) = \frac{1}{2m} \sum_{i=1}^m(f_{\vec{w},b}(\vec{x}^{(i)}-y^{(i)})^2 \; + \frac{\lambda}{2m} \sum_{j=1}^nw_j^2

其中,m为数据集大小,n为参数个数, λ  \lambda\;称为regularization parameter,λ>0\lambda>0.

Regularized Linear Regression and Logistic Regression

由于对线性回归和逻辑回归采取正则化,其Gradient Descent的表达式也有所调整.

repeat until convergence:  {      wj=wjαJ(w,b)wj            b=bαJ(w,b)b}\begin{align*} &\text{repeat until convergence:} \; \lbrace \\ & \; \; \;w_j = w_j - \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \; \\ & \; \; \; \; \;b = b - \alpha \frac{\partial J(\mathbf{w},b)}{\partial b} \\ &\rbrace \end{align*}

其中, 有关wjw_j的部分多了一项,如下所示.而b的部分不变,因为没有对它进行正则化.

J(w,b)wj=1mi=0m1(fw,b(x(i))y(i))xj(i)  +λmwjJ(w,b)b=1mi=0m1(fw,b(x(i))y(i))\begin{align*} \frac{\partial J(\mathbf{w},b)}{\partial w_j} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \; + \frac \lambda m w_j\\ \frac{\partial J(\mathbf{w},b)}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \end{align*}