Generalization 泛化

means the model can make good predictions even on brand new examples that it has never seen before.

Under-fitting(high bias)

means the model does not fit the training set well, which is also called high bias.

Addressing Under-fitting

Add more features as input
Redesign a more complex model

Over-fitting(high-variance,高方差)

means the model fits the training set too well to generalize to new examples that’s never seen before.

Addressing Over-fitting

Collect more training data
Feature Selection: Select features to include/exclude:
all features + insufficient data ==> over-fit
Regularization 正则化: reduce size of parameters

Regularization

希望取得一个较为平滑的曲线，但又不能过于平滑。

在特征参数过多时,往往不知道哪些重要,哪些需要惩罚,因此,正则化时,选择惩罚所有参数, 通过为Cost Function 加上下面这一项regularization term.（关注曲线平滑程度，故不需要考虑 $b$ )

$\frac{\lambda}{2m} \sum_{j=1}^nw_j^2$

$J(\vec{w},b) = \frac{1}{2m} \sum_{i=1}^m(f_{\vec{w},b}(\vec{x}^{(i)}-y^{(i)})^2 \; + \frac{\lambda}{2m} \sum_{j=1}^nw_j^2$

其中,m为数据集大小,n为参数个数, $\lambda\;$ 称为regularization parameter, $\lambda>0$ .

Regularized Linear Regression and Logistic Regression

由于对线性回归和逻辑回归采取正则化,其Gradient Descent的表达式也有所调整.

$\begin{align*} &\text{repeat until convergence:} \; \lbrace \\ & \; \; \;w_j = w_j - \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \; \\ & \; \; \; \; \;b = b - \alpha \frac{\partial J(\mathbf{w},b)}{\partial b} \\ &\rbrace \end{align*}$

其中, 有关 $w_j$ 的部分多了一项,如下所示.而b的部分不变,因为没有对它进行正则化.

$\begin{align*} \frac{\partial J(\mathbf{w},b)}{\partial w_j} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \; + \frac \lambda m w_j\\ \frac{\partial J(\mathbf{w},b)}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \end{align*}$