忙着看论文好几天没看scikit-learn了,接着看Tutorials
首先是一个KNN分类的例子,KNN思路还是很简单的,是一种不需要建模的算法,就是看离需要预测点最近的k个点,k个点中属于哪类的最多就属于哪类
1 import numpy as np 2 from sklearn import datasets 3 iris = datasets.load_iris() 4 iris_X = iris.data 5 iris_y = iris.target 6 # Split iris data in train and test data 7 # A random permutation, to split the data randomly 8 np.random.seed(0) 9 indices = np.random.permutation(len(iris_X))10 iris_X_train = iris_X[indices[:-10]]11 iris_y_train = iris_y[indices[:-10]]12 iris_X_test = iris_X[indices[-10:]]13 iris_y_test = iris_y[indices[-10:]]14 # Create and fit a nearest-neighbor classifier15 from sklearn.neighbors import KNeighborsClassifier16 knn = KNeighborsClassifier()17 knn.fit(iris_X_train, iris_y_train)18 knn.predict(iris_X_test)19 iris_y_test
接下来是线性回归的模型,线性回归一般都是以均方差为目标函数,使用批梯度下降或者随机梯度下降不断调整参数减小均方差直到误差满足要求,
regr.coef_是线性回归中的参数列表,regr.score计算自变量和因变量的相关程度1 from sklearn import datasets 2 import numpy as np 3 diabetes = datasets.load_diabetes() 4 diabetes_X_train = diabetes.data[:-20] 5 diabetes_X_test = diabetes.data[-20:] 6 diabetes_y_train = diabetes.target[:-20] 7 diabetes_y_test = diabetes.target[-20:] 8 from sklearn import linear_model 9 regr = linear_model.LinearRegression()10 regr.fit(diabetes_X_train, diabetes_y_train)11 print(regr.coef_)12 # The mean square error13 np.mean((regr.predict(diabetes_X_test)-diabetes_y_test)**214) 14 regr.score(diabetes_X_test, diabetes_y_test)
岭回归是最小二乘法的改良版,具体思路参考:
1 alphas = np.logspace(-4, -1, 6)2 from __future__ import print_function3 print([regr.set_params(alpha=alpha\4 ).fit(diabetes_X_train, diabetes_y_train,\5 ).score(diabetes_X_test, diabetes_y_test) for alpha in alphas])