Univariate Linear Regression:fw,b=wx+bf_{w,b} = wx +b

Multiple Linear Regression:

fw,b=w1x1+w2x2+...+wnxn+bf_{w,b} = w_1 x_1 + w_2 x_2 +...+ w_n x_n + b

Notations:

w=[w1,w2,...,wn]\vec{w} = [w_1, w_2,..., w_n]

x=[x1,x2,...,xn]\vec{x} = [x_1, x_2,..., x_n]

Vectorization(向量化):

fw,b(x)=wx+bf_{\vec{w},b}(\vec{x}) = \vec{w} \cdot \vec{x} +b

1
2
3
4
# NumPy 向量
w = np.array([1.0,2.5,-3.3])
b = 4
x = np.array([10,20,30])

Without Vectorization:

1
2
3
f = w[0] * x[0] +
w[1] * x[1] +
w[2] * x[2] + b
1
2
3
4
f = 0
for i in range(n):
f = f + w[i] * x[i]
f = f + b

Vectorization :

1
2
#np.dot() 点乘
f = np.dot(w,x) + b

code shorter, run faster

Gradient descent

Univariate Linear Regression:

w=wα1m1m(fw,b(x(i))y(i))x(i)w =w - \alpha \frac{1}{m} \sum_1^m (f_{w,b}(x^{(i)})-y^{(i)})x^{(i)}

b=bα1m1m(fw,b(x(i))y(i))b = b - \alpha \frac{1}{m} \sum_1^m (f_{w,b}(x^{(i)})-y^{(i)})

Multiple Linear Regression:

wj=wjα1m1m(fw,b(x(i))y(i))xj(i)w_j = w_j - \alpha \frac{1}{m} \sum_1^m (f_{\vec{w},b}(\vec{x}^{(i)}) - y^{(i)})x_j^{(i)}

Feature Scaling 特征缩放

不同特征的取值间相差较大时,可能会使梯度下降达到收敛过慢,这种情况下,重新缩放不同的特征,使它们都具有可比的取值范围,来speed up Gradient Descent。

  1. 标准化:除以最大值 ==> 取值范围 0-1

  2. mean normalization(均值归一化): xiμimaxximinxi\frac{x_i - \mu_i}{\max \limits_{x_i} - \min\limits_{x_i}} 其中 μi\mu_ixix_i的均值

  3. Z-score Normalization: xiμiσi\frac{x_i - \mu_i}{\sigma_i}, 其中μi\mu_i为均值 σi\sigma_i为标准差

Converge of Gradient Descent

  1. 学习曲线(learning curve)随迭代次数趋于某一水平渐近线

  2. ϵ\epsilon作为成本函数J(w,b)J(\vec{w},b)每次迭代减少的下限,当成本函数减少值<=ϵ<=\epsilon时,认为收敛

Identify problem with gradient descent

With a small enough α\alpha ,J(w,b)J(\vec{w},b) should **decrease **on every iteration

  1. 学习曲线上下波动 ==> 学习率α\alpha过大
  2. 学习曲线不断增大 ==> code bug

Pick α\alpha values technique

0.001 -> 0.003 -> 0.01 -> 0.03 -> 0.1 -> 0.3 -> 1 -> …

每次放大3倍或10倍

选取收敛时迭代次数最少的一个(或比之稍小的)α\alpha

Feature Engineering 特征工程

Using intuition to design new features, by transforming or combining original features.

Sometimes, by defining new features, you might be able to get a much better model.

Polynomial Regression 多项式回归

Feature Scaling 在这里更加重要,因为不同特征的值相差会hen’da