摘要:下面是回归的向量化代码结果梯度验证简单说来就是用求导的近似值去验证我们按照公式计算的导数值是否正确。
前言
最近开始看Andrew Ng 大牛的深度学习教程,算是作为对自己的一个激励,也作为日后回顾的办法,开始记录学习笔记,每一章节分别对应,所有章节写在这一片文章里便于查询。所以我会不断更新滴~
线性回归本章大致讲解了线性分类器的原理(他假设我们已经有这些基础了,只是作为复习梯度下降的一个办法,其实能看这些教程的都应该有机器学习的基础知识,所以有好多基础知识我就直接省略不写啦),然后练习是实现目标函数以及所有参数对应的梯度的计算,我的代码如下:
function [f,g] = linear_regression(theta, X,y) % % Arguments: % theta - A vector containing the parameter values to optimize. % X - The examples stored in a matrix. % X(i,j) is the i"th coordinate of the j"th example. % y - The target value for each example. y(j) is the target for example j. % m=size(X,2);%样本数量 n=size(X,1);%特征维度 f=0; g=zeros(size(theta)); % % TODO: Compute the linear regression objective by looping over the examples in X. % Store the objective function value in "f". % % TODO: Compute the gradient of the objective with respect to theta by looping over % the examples in X and adding up the gradient for each example. Store the % computed gradient in "g". %%% YOUR CODE HERE %%% for j = 1:m f = f + 0.5*(theta"*X(:,j)-y(j))^2; end % ---------- for i = 1:n for j = 1:m g(i) = g(i) + X(i,j)*(theta"*X(:,j)-y(j)) end end
最终结果如下:
Optimization took 128.640734 seconds.%花这么多时间是因为我把循环里的参数打出来了 RMS training error: 4.843147 RMS testing error: 4.151706Logistics回归
说是回归,其实是分类,本章节主要实现了一个手写字符分类,而且是最简单的0-1分类,所以结果正确率相当之高。我的代码如下:
function [f,g] = logistic_regression(theta, X,y) % % Arguments: % theta - A column vector containing the parameter values to optimize. % X - The examples stored in a matrix. % X(i,j) is the i"th coordinate of the j"th example. % y - The label for each example. y(j) is the j"th example"s label. % m=size(X,2);%训练图片数量 n=size(X,1);%图片像素点数+1 % initialize objective value and gradient. f = 0; g = zeros(size(theta)); % % TODO: Compute the objective function by looping over the dataset and summing % up the objective values for each example. Store the result in "f". % % TODO: Compute the gradient of the objective by looping over the dataset and summing % up the gradients (df/dtheta) for each example. Store the result in "g". % %%% YOUR CODE HERE %%% for j = 1:m f = f - ( y(j)*log(1/(1+exp(-theta"*X(:,j)))) + (1-y(j))*log(1-(1/(1+exp(-theta"*X(:,j))))) ); end % ---------- for i = 1:n for j = 1:m g(i) = g(i) + X(i,j)*(1/(1+exp(-theta"*X(:,j)))-y(j)); end end
结果:
Optimization took 7874.049756 seconds.%我等到花儿都谢了 Training accuracy: 100.0% Test accuracy: 100.0%向量化
向量化是节约时间的一大法宝,说白了就是利用matlab矩阵计算的优势弥补它在循环上的劣势。我的线性回归代码:
function [f,g] = linear_regression_vec(theta, X,y) % % Arguments: % theta - A vector containing the parameter values to optimize. % X - The examples stored in a matrix. % X(i,j) is the i"th coordinate of the j"th example. % y - The target value for each example. y(j) is the target for example j. % m=size(X,2); % initialize objective value and gradient. f = 0; g = zeros(size(theta)); % % TODO: Compute the linear regression objective function and gradient % using vectorized code. (It will be just a few lines of code!) % Store the objective function value in "f", and the gradient in "g". % %%% YOUR CODE HERE %%% f = sum((theta"*X - y).^2) * 0.5; y_hat = theta"*X; g = X*(y_hat" - y");
结果:
Optimization took 0.108650 seconds. RMS training error: 4.650101 RMS testing error: 4.856230
真是非常省时省力哈。不过这些i,j下标,还有转置真是让人头晕,实际写的时候可以用调试模式来观察你的数据,然后修改你的小标,决定是否转置(目的不都是为了矩阵符合相乘的条件嘛)。还有在一次试验中尽量记住每一个常用变量的含义,比如在整篇教程中,m 代表样本数量,n 代表特征维度。
下面是Logistic 回归的向量化代码:
function [f,g] = logistic_regression_vec(theta, X,y) % % Arguments: % theta - A column vector containing the parameter values to optimize. % X - The examples stored in a matrix. % X(i,j) is the i"th coordinate of the j"th example. % y - The label for each example. y(j) is the j"th example"s label. % m=size(X,2); % initialize objective value and gradient. f = 0; g = zeros(size(theta)); % % TODO: Compute the logistic regression objective function and gradient % using vectorized code. (It will be just a few lines of code!) % Store the objective function value in "f", and the gradient in "g". % %%% YOUR CODE HERE %%% h = sigmoid(theta"*X); f = -sum(y.*log(h) + (1-y).*log(1 - h)); g = X*(h - y)";
结果:
Optimization took 3.064685 seconds. Training accuracy: 100.0% Test accuracy: 100.0%梯度验证
简单说来就是用求导的近似值去验证我们按照公式计算的导数值是否正确。
我们使用grad_check.m:
function average_error = grad_check(fun, theta0, num_checks, varargin) delta=1e-3; sum_error=0; fprintf(" Iter i err"); fprintf(" g_est g f ") for i=1:num_checks T = theta0; j = randsample(numel(T),1);%theta选择一个随机下标 T0=T; T0(j) = T0(j)-delta;%θ(j-),亦即θ的第j个元素减去delta T1=T; T1(j) = T1(j)+delta;%θ(j+) [f,g] = fun(T, varargin{:}); f0 = fun(T0, varargin{:});%J(θ(j-)) f1 = fun(T1, varargin{:});%J(θ(j+)) g_est = (f1-f0) / (2*delta); error = abs(g(j) - g_est); %循环次数,theta下标,偏差绝对值,真实值,估计值,函数值 fprintf("% 5d % 6d % 15g % 15f % 15f % 15f ", ... i,j,error,g(j),g_est,f); sum_error = sum_error + error; end average_error=sum_error/num_checks;
在ex1a_linreg.m中加入;
average_error = grad_check(@linear_regression_vec,theta,30,train.X,train.y); fprintf("The Average error is :%f ",average_error);
运行结果:
Iter i err g_est g f 1 14 8.0571e-06 1418640.687110 1418640.687102 14517559.734147 2 3 3.73228e-07 1100385.922200 1100385.922200 14517559.734147 3 4 2.48384e-06 1236106.996470 1236106.996473 14517559.734147 4 13 5.16325e-06 38562142.957593 38562142.957588 14517559.734147 5 14 8.0571e-06 1418640.687110 1418640.687102 14517559.734147 6 10 6.0685e-06 1118680.054414 1118680.054408 14517559.734147 7 13 5.16325e-06 38562142.957593 38562142.957588 14517559.734147 8 10 6.0685e-06 1118680.054414 1118680.054408 14517559.734147 9 14 8.0571e-06 1418640.687110 1418640.687102 14517559.734147 10 11 2.87592e-06 45661592.041328 45661592.041331 14517559.734147 11 13 5.16325e-06 38562142.957593 38562142.957588 14517559.734147 12 2 1.97807e-06 436767.013214 436767.013212 14517559.734147 13 14 8.0571e-06 1418640.687110 1418640.687102 14517559.734147 14 14 8.0571e-06 1418640.687110 1418640.687102 14517559.734147 15 11 2.87592e-06 45661592.041328 45661592.041331 14517559.734147 16 1 3.02999e-06 106041.865458 106041.865461 14517559.734147 17 5 1.42339e-06 6344.599333 6344.599332 14517559.734147 18 9 3.8307e-06 389421.210472 389421.210468 14517559.734147 19 7 3.66173e-06 660532.159808 660532.159812 14517559.734147 20 5 1.42339e-06 6344.599333 6344.599332 14517559.734147 21 4 2.48384e-06 1236106.996470 1236106.996473 14517559.734147 22 9 3.8307e-06 389421.210472 389421.210468 14517559.734147 23 7 3.66173e-06 660532.159808 660532.159812 14517559.734147 24 11 2.87592e-06 45661592.041328 45661592.041331 14517559.734147 25 11 2.87592e-06 45661592.041328 45661592.041331 14517559.734147 26 12 2.83984e-06 1978417.905024 1978417.905027 14517559.734147 27 5 1.42339e-06 6344.599333 6344.599332 14517559.734147 28 12 2.83984e-06 1978417.905024 1978417.905027 14517559.734147 29 5 1.42339e-06 6344.599333 6344.599332 14517559.734147 30 10 6.0685e-06 1118680.054414 1118680.054408 14517559.734147 The Average error is :0.000004
可见我们的梯度计算是正确的。(其实这个代码还是可优化的哈,循环里有几行可以提到循环外面去,比如
T = theta0; [f,g] = fun(T, varargin{:});SoftMax 回归
其实就是多类别的Logistics回归(区分于二分类),我的代码如下:
function [f,g] = softmax_regression_vec(theta, X,y) % % Arguments: % theta - A vector containing the parameter values to optimize. % In minFunc, theta is reshaped to a long vector. So we need to % resize it to an n-by-(num_classes-1) matrix. % Recall that we assume theta(:,num_classes) = 0. % % X - The examples stored in a matrix. % X(i,j) is the i"th coordinate of the j"th example. % y - The label for each example. y(j) is the j"th example"s label. % m=size(X,2);%样本数量 n=size(X,1);%特征维度 % theta is a vector; need to reshape to n x num_classes. theta=reshape(theta, n, []); num_classes=size(theta,2)+1; % initialize objective value and gradient. f = 0; g = zeros(size(theta)); % % TODO: Compute the softmax objective function and gradient using vectorized code. % Store the objective function value in "f", and the gradient in "g". % Before returning g, make sure you form it back into a vector with g=g(:); % %%% YOUR CODE HERE %%% indictor = full(sparse(y, 1:m, 1));%示性函数 theta = [theta,zeros(n,1)]; %恢复theta,增加一行 a = exp(theta"*X); p = bsxfun(@rdivide,a,sum(a)); l = log(p); %f = -sum(indictor*log(p);%这样的话产生过大的矩阵,不允许 f = -indictor(:)"*l(:); g = -X * (indictor-p)"; g = g(:,1:end- 1); %减去一行 g=g(:); % make gradient a vector for minFunc
结果:
Optimization took 91.072469 seconds. Training accuracy: 94.4% Test accuracy: 92.2%
文章版权归作者所有,未经允许请勿转载,若此文章存在违规行为,您可以联系管理员删除。
转载请注明本文地址:https://www.ucloud.cn/yun/20090.html
摘要:下面是回归的向量化代码结果梯度验证简单说来就是用求导的近似值去验证我们按照公式计算的导数值是否正确。 前言 最近开始看Andrew Ng 大牛的深度学习教程,算是作为对自己的一个激励,也作为日后回顾的办法,开始记录学习笔记,每一章节分别对应,所有章节写在这一片文章里便于查询。所以我会不断更新滴~ 线性回归 本章大致讲解了线性分类器的原理(他假设我们已经有这些基础了,只是作为复习梯度下降...
摘要:下面是回归的向量化代码结果梯度验证简单说来就是用求导的近似值去验证我们按照公式计算的导数值是否正确。 前言 最近开始看Andrew Ng 大牛的深度学习教程,算是作为对自己的一个激励,也作为日后回顾的办法,开始记录学习笔记,每一章节分别对应,所有章节写在这一片文章里便于查询。所以我会不断更新滴~ 线性回归 本章大致讲解了线性分类器的原理(他假设我们已经有这些基础了,只是作为复习梯度下降...
摘要:上面这张图中的第二个就是用来做的,这的其实就是我想要学的东西,因为我最近刚把跑通,过个几天想要参加下的比赛,网上搜到在年就做了英文的类似项目,第一名是的。 写在前面 这节课讲的内容是RNN有关的知识内容,本来是直接要做assignment3的,后来看完英文介绍和我下下来的代码发现就像是做完形填空一样我要把代码填入预先设置好的位置。一下子蒙圈了,这怎么填???问了同学后才知道,Sylla...
摘要:上面这张图中的第二个就是用来做的,这的其实就是我想要学的东西,因为我最近刚把跑通,过个几天想要参加下的比赛,网上搜到在年就做了英文的类似项目,第一名是的。 写在前面 这节课讲的内容是RNN有关的知识内容,本来是直接要做assignment3的,后来看完英文介绍和我下下来的代码发现就像是做完形填空一样我要把代码填入预先设置好的位置。一下子蒙圈了,这怎么填???问了同学后才知道,Sylla...
摘要:最近一直在看,各类博客论文看得不少但是说实话,这样做有些疏于实现,一来呢自己的电脑也不是很好,二来呢我目前也没能力自己去写一个只是跟着的写了些已有框架的代码这部分的代码见后来发现了一个的的,发现其代码很简单,感觉比较适合用来学习算法再一个就 最近一直在看Deep Learning,各类博客、论文看得不少但是说实话,这样做有些疏于实现,一来呢自己的电脑也不是很好,二来呢我目前也没能力自己去写一...
阅读 2529·2021-10-12 10:12
阅读 2095·2021-09-02 15:41
阅读 2335·2019-08-30 15:55
阅读 1232·2019-08-30 13:05
阅读 2256·2019-08-29 11:21
阅读 3368·2019-08-28 17:53
阅读 2913·2019-08-26 13:39
阅读 652·2019-08-26 11:50