梯度下降法的向量化和数据标准化
用向量运算代替for循环
在Numpy里,一个向量默认都是列向量,要转换成行向量需要转置。但是Numpy里对于行,列向量的运算是一样的,不做区分。
上一节求梯度的方法可以转换成简洁的矩阵运算:
def dJ(theta, X_b, y):
gd = np.zeros_like(theta)
for i in range(len(theta)):
if i == 0:
gd[0] = np.sum((X_b.dot(theta) - y))
else:
gd[i] = np.sum((X_b.dot(theta) - y).dot(X_b[:, i]))
return gd * 2 / len(X_b)
转换为:
def dJ(theta, X_b, y):
return X_b.T.dot(X_b.dot(theta) - y) * 2. / len(X_b)
最终的代码fit_gd()如下:
def fit_gd(self, X_train, y_train, eta = 0.1, n_iters=1e4):
assert X_train.shape[0] == y_train.shape[0], "The X_train's size should be equal to the y_train's size"
def dJ(theta, X_b, y):
# res = np.zeros_like(theta)
# gd = np.zeros_like(theta)
# for i in range(len(theta)):
# if i == 0:
# gd[0] = np.sum((X_b.dot(theta) - y))
# else:
# gd[i] = np.sum((X_b.dot(theta) - y).dot(X_b[:, i]))
# return gd * 2 / len(X_b)
return X_b.T.dot(X_b.dot(theta) - y) * 2. / len(X_b)
def J(theta, X_b, y):
try:
return np.sum((X_b.dot(theta) - y) ** 2) / len(X_b)
except:
return float('inf')
def gradient_descent(X_b, y, initial_theta, eta, n_iters, epsilon=1e-8):
theta = initial_theta
i_iter = 0
while (i_iter < n_iters):
last_theta = theta
gradient = dJ(theta, X_b, y)
theta = theta - eta*gradient
if np.abs(J(theta, X_b, y) - J(last_theta, X_b, y)) < epsilon:
break
i_iter += 1
return theta
X_b = np.hstack((np.ones((len(X_train), 1)), X_train))
initial_theta = np.zeros(X_b.shape[1])
self._theta = gradient_descent(X_b, y_train, initial_theta, eta, n_iters)
self.coef_ = self._theta[1:]
self.intercept_ = self._theta[0]
return self
boston房价数据读取和regression类加载:
import numpy as np
from sklearn import datasets
import matplotlib.pyplot as plt
import pandas as pd
data= pd.read_csv("boston_house_prices.csv", skiprows=[0])
array = data.values
X = array[:, :13]
y = array[:, 13]
import sys
sys.path.append(r'C:\\N-20KEPC0Y7KFA-Data\\junhuawa\\Documents\\00-Play-with-ML-in-Python\\Jupyter')
import playML
from playML.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, seed=666)
from playML.LinearRegression import LinearRegression
数据归一化
在真实数据中,每一维的数据大小相差很大,导致需要步长尽可能小(0.000001),回归次数(1e6)尽可能多才能达到理想的模型, 但是这样也会导致收敛速度很慢,计算量很大的问题。为了达到理想的效果,需要对数据进行归一化。归一化后,算法效率提高很多。
梯度下降法和线性回归法(正规方程法)的比较:
- 正规方程解法中不需要对数据做归一化,因为求解只是对矩阵的运算,没有搜索的过程。 在GD中,步长会受到各个维度数据的影响。数据归一化可以解决这个问题,数据归一化后,收敛速度大幅提高!
- 正规方程法需要对m*n的矩阵做大量的乘法运算( $O(n^3)$ ),矩阵大时效率低。
- 但是对于梯度下降法,如果样本数量m增大,因为每次都要对所有样本进行计算,效率也会降低,所以引出了随机梯度下降法。
from sklearn.preprocessing import StandardScaler
standardScaler = StandardScaler()
standardScaler.fit(X_train)
X_train_standard = standardScaler.transform(X_train)
lin_reg3 = LinearRegression()
%time lin_reg3.fit_gd(X_train_standard, y_train)
X_test_standard = standardScaler.transform(X_test)
lin_reg3.score(X_test_standard, y_test)# R^2
正规法和梯度下降法性能比较:
m = 1000
n = 5000
big_X = np.random.normal(size=(m, n))
true_theta = np.random.uniform(0.0, 100.0, size=n+1)
big_y = big_X.dot(true_theta[1:]) + true_theta[0]+np.random.normal(0.0, 10., size=m)
big_reg1 = LinearRegression()
%time big_reg1.fit_normal(big_X, big_y)
big_reg2 = LinearRegression()
%time big_reg2.fit_gd(big_X, big_y)
结果:
CPU times: total: 14.8 s
Wall time: 5.1 s
CPU times: total: 11 s
Wall time: 3.49 s
总结: 当特征值和样本量数量很大时,GD性能比LR好!,但对于GD来说,如果样本量增大,他的性能也会减少,所以引入随机梯度下降法。一次只用计算一个样本的梯度。
https://blog.csdn.net/hai411741962/article/details/132580367