Univariate Linear Regression: $f_{w,b} = wx +b$

Multiple Linear Regression:

$f_{w,b} = w_1 x_1 + w_2 x_2 +...+ w_n x_n + b$

Notations:

$\vec{w} = [w_1, w_2,..., w_n]$

$\vec{x} = [x_1, x_2,..., x_n]$

Vectorization(向量化):

$f_{\vec{w},b}(\vec{x}) = \vec{w} \cdot \vec{x} +b$

# NumPy 向量
w = np.array([1.0,2.5,-3.3])
b = 4
x = np.array([10,20,30])

Without Vectorization:

1
2
3

f = w[0] * x[0] +
	w[1] * x[1] +
    w[2] * x[2] + b

f = 0
for i in range(n):
    f = f + w[i] * x[i]
f = f + b

Vectorization :

1 2	#np.dot() 点乘 f = np.dot(w,x) + b

code shorter, run faster

Gradient descent

Univariate Linear Regression:

$w =w - \alpha \frac{1}{m} \sum_1^m (f_{w,b}(x^{(i)})-y^{(i)})x^{(i)}$

$b = b - \alpha \frac{1}{m} \sum_1^m (f_{w,b}(x^{(i)})-y^{(i)})$

Multiple Linear Regression:

$w_j = w_j - \alpha \frac{1}{m} \sum_1^m (f_{\vec{w},b}(\vec{x}^{(i)}) - y^{(i)})x_j^{(i)}$

Feature Scaling 特征缩放

不同特征的取值间相差较大时，可能会使梯度下降达到收敛过慢，这种情况下，重新缩放不同的特征，使它们都具有可比的取值范围，来speed up Gradient Descent。

标准化：除以最大值 ==> 取值范围 0-1
mean normalization(均值归一化): $\frac{x_i - \mu_i}{\max \limits_{x_i} - \min\limits_{x_i}}$ 其中 $\mu_i$ 为 $x_i$ 的均值
Z-score Normalization: $\frac{x_i - \mu_i}{\sigma_i}$ , 其中 $\mu_i$ 为均值 $\sigma_i$ 为标准差

With a small enough $\alpha$ , $J(\vec{w},b)$ should **decrease **on every iteration

0.001 -> 0.003 -> 0.01 -> 0.03 -> 0.1 -> 0.3 -> 1 -> …

每次放大3倍或10倍

选取收敛时迭代次数最少的一个（或比之稍小的） $\alpha$ 值

Using intuition to design new features, by transforming or combining original features.

Sometimes, by defining new features, you might be able to get a much better model.

Feature Scaling 在这里更加重要,因为不同特征的值相差会hen’da