【原】Coursera—Andrew Ng机器学习—编程作业 Programming Exercise 4 反向传播神经网络

课程笔记 Coursera—Andrew Ng机器学习—课程笔记 Lecture 9_Neural Networks learning 作业说明 Exercise 4,Week 5,实现反向传播 backpropagation神经网络算法, 对图片中手写数字 0-9 进行识别。 数据集 :ex4data1.mat。手写数字图片数据,5000个样例。每张图片20px * 20px,也就是一共400个特征。数据集X维度为5000 * 400     ex4weights.mat。神经网络每一层的权重。 文件清单 ex4.m- Octave/MATLAB script that steps you through the exercise ex4data1.mat- Training set of hand-written digits ex4weights.mat- Neural network parameters for exercise 4 submit.m- Submission script that sends your solutions to our servers displayData.m- Function to help visualize the dataset fmincg.m- Function minimization routine (similar to fminunc) sigmoid.m- Sigmoid function computeNumericalGradient.m- Numerically compute gradients checkNNGradients.m- Function to help check your gradients debugInitializeWeights.m- Function for initializing weights predict.m- Neural network prediction function [*] sigmoidGradient.m- Compute the gradient of the sigmoid function [*] randInitializeWeights.m- Randomly initialize weights [*] nnCostFunction.m- Neural network cost function   * 为必须要完成的 结论 和上一周的作业一样。因为Octave里数组下标从1开始。所以这里将分类结果0用10替代。预测结果中的1-10代表图片数字为1,2,3,4,5,6,7,8,9,0 矩阵运算 tricky 的地方在于维度对应,哪里需要转置很关键。 1 神经网络 1.1 数据可视化 在数据集X里随机选择100个数字,绘制图像 displayData.m: 复制代码 function [h, display_array] = displayData(X, example_width) %DISPLAYDATA Display 2D data in a nice grid % [h, display_array] = DISPLAYDATA(X, example_width) displays 2D data % stored in X in a nice grid. It returns the figure handle h and the % displayed array if requested. % Set example_width automatically if not passed in if ~exist('example_width', 'var') || isempty(example_width) example_width = round(sqrt(size(X, 2))); end % Gray Image colormap(gray); % Compute rows, cols [m n] = size(X); example_height = (n / example_width); % Compute number of items to display display_rows = floor(sqrt(m)); display_cols = ceil(m / display_rows); % Between images padding pad = 1; % Setup blank display display_array = - ones(pad + display_rows * (example_height + pad), ... pad + display_cols * (example_width + pad)); % Copy each example into a patch on the display array curr_ex = 1; for j = 1:display_rows for i = 1:display_cols if curr_ex > m, break; end % Copy the patch % Get the max value of the patch max_val = max(abs(X(curr_ex, :))); display_array(pad + (j - 1) * (example_height + pad) + (1:example_height), ... pad + (i - 1) * (example_width + pad) + (1:example_width)) = ... reshape(X(curr_ex, :), example_height, example_width) / max_val; curr_ex = curr_ex + 1; end if curr_ex > m, break; end end % Display Image h = imagesc(display_array, [-1 1]); % Do not show axis axis image off drawnow; end 复制代码 ex4.m里的调用 复制代码 load('ex4data1.mat'); m = size(X, 1); % Randomly select 100 data points to display sel = randperm(size(X, 1)); sel = sel(1:100); displayData(X(sel, :)); 复制代码 运行效果如下: 1.2 模型表示 ex4.m 里载入已经调好的权重矩阵weight。 复制代码 % Load saved matrices from file load('ex4weights.mat'); % The matrices Theta1 and Theta2 will now be in your workspace % Theta1 has size 25 x 401 % Theta2 has size 10 x 26 复制代码 这里g(z) 使用 sigmoid 函数。 神经网络中,从上到下的每个原点是 feature 特征 x0, x1, x2...,不是实例。计算过程其实就是 feature 一层一层映射的过程。一层转换之后,feature可能变多、也可能变少。下一层 i+1层 feature 的个数是通过权重矩阵里当前 θ(i) 的 row 行数来控制。 两层权重 θ 已经在 ex4weights.mat 里给出。从a1映射到a2权重矩阵 θ1为 25 * 401,从a2映射到a3权重矩阵 θ2为10 * 26。因为最后有10个分类。(这意味着运算的时候要注意转置) 1.3 前馈神经网络和代价函数 首先完成不包含正则项的代价函数,公式如下: 注意,和之前不同的是: 由于y是范围0-9的数字,计算之前需要转换为下面这种向量的形式: 代码为: 复制代码 % convert y(0-9) to vector c = 1:num_labels; yt = zeros(m,num_labels); for i = 1:m yt(i,:) = (c==y(i)); end 复制代码 nnCostFunction.m 计算代价函数的代码如下: 复制代码 % compute h(x) a1 = [ones(m, 1) X]; %5000x401 a2 = sigmoid(a1 * Theta1'); %5000x401乘以401x25得到5000x25。即把401个feature映射到25 a2 = [ones(m, 1) a2]; %5000x26 hx = sigmoid(a2 * Theta2'); %5000x26乘以26x10得到5000x10。即把26个feature映射到10 % first term part1 = -yt.*log(hx); % second term part2 = (1-yt).*log(1-hx); % compute J J = 1 / m * sum(sum(part1 - part2)); 复制代码 需要注意的是,上一次作业里逻辑回归的代价函数计算使用的是矩阵相乘的方式 part1 = -yt' * log(hx); part2 = (1-yt') * log(1-hx); 而这里神经网络的公式中有两层求和,需要使用 “矩阵点乘,sum,再sum” 的方式计算。 如果使用矩阵相乘省略一层sum,结果会出错。 1.4 正则化的代价函数 给神经网络中的代价函数加上正则项,公式如下: nnCostFunction.m 代码如下: 复制代码 % convert y(0-9) to vector c = 1:num_labels; yt = zeros(m,num_labels); for i = 1:m yt(i,:) = (c==y(i)); end % compute h(x) a1 = [ones(m, 1) X]; %5000x401 a2 = sigmoid(a1 * Theta1'); %5000x401乘以401x25得到5000x25。即把401个feature映射到25 a2 = [ones(m, 1) a2]; %5000x26 hx = sigmoid(a2 * Theta2'); %5000x26乘以26x10得到5000x10。即把26个feature映射到10 % first term part1 = -yt.*log(hx); % second term part2 = (1-yt).*log(1-hx); % regularization term regTerm = lambda / 2 / m * (sum(sum(Theta1(:,2:end).^2)) + sum(sum(Theta2(:,2:end).^2))); % J with regularization J = 1 / m * sum(sum(part1 - part2)) + regTerm; 复制代码 ex4.m 里的调用如下: 复制代码 % 不使用正则化 lambda = 0; J = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, ... num_labels, X, y, lambda); % 使用正则化 lambda = 1; J = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, ... num_labels, X, y, lambda); 复制代码 2 反向传播 2.1 sigmoid gradient 计算sigmoid函数的梯度,公式如下: sigmoidGradient.m 复制代码 function g = sigmoidGradient(z) %SIGMOIDGRADIENT returns the gradient of the sigmoid function %evaluated at z % g = SIGMOIDGRADIENT(z) computes the gradient of the sigmoid function % evaluated at z. This should work regardless if z is a matrix or a % vector. In particular, if z is a vector or matrix, you should return % the gradient for each element.    g = sigmoid(z).*(1-sigmoid(z));  //要求对向量和矩阵同样适用,所以使用点乘而不是直接相乘 end 复制代码 ex4.m中的调用 复制代码 %% ================ Part 5: Sigmoid Gradient ================ g = sigmoidGradient([1 -0.5 0 0.5 1]); fprintf('Sigmoid gradient evaluated at [1 -0.5 0 0.5 1]:\n '); fprintf('%f ', g); 复制代码 2.2 随机初始化 在训练神经网络时,随机初始化参数来进行对称破坏非常重要。随机初始化的一个有效策略是在 的范围内统一随机选择 θ(l)的值,你应该使用 εinit = 0.12。 这里对值的选择有一个说明: randInitializeWeights.m 复制代码 function W = randInitializeWeights(L_in, L_out) %RANDINITIALIZEWEIGHTS Randomly initialize the weights of a layer with L_in %incoming connections and L_out outgoing connections % W = RANDINITIALIZEWEIGHTS(L_in, L_out) randomly initializes the weights % of a layer with L_in incoming connections and L_out outgoing % connections. % % Note that W should be set to a matrix of size(L_out, 1 + L_in) as % the column row of W handles the "bias" terms epsilon_init = 0.12; W = rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init; end 复制代码 ex4.m 里的调用为: 复制代码 %% ================ Part 6: Initializing Pameters ================ initial_Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size); initial_Theta2 = randInitializeWeights(hidden_layer_size, num_labels); % Unroll parameters initial_nn_params = [initial_Theta1(:) ; initial_Theta2(:)]; 复制代码 2.3 反向传播 反向传播算法,由右到左计算误差项 δj(l): 详细请看我的课程笔记:Coursera—Andrew Ng机器学习—课程笔记 Lecture 9_Neural Networks learning (1)根据上面的公式计算 “误差项 error term”。 代码如下: 复制代码 %----------------------------PART 2---------------------------------- % Accumulate the error term delta_3 = hx - yt; % 5000 x 10 delta_2 = delta_3 * Theta2 .* sigmoidGradient([ones(m, 1) z2]); % 5000 x 26 = 5000 x 10 * 10 x 26 .* 5000 x 26 % 去掉 δ2(0) 这一项 delta_2 = delta_2(:,2:end);                       % 5000 x 25   复制代码 (2)计算梯度,公式和代码如下: 复制代码 % Accumulate the gradient D2 = delta_3' * a2; % 10 x 26 = 10 x 5000 * 5000 x 26 D1 = delta_2' * a1; % 25 x 401 = 25 x 5000 * 5000 x 401 复制代码 (4)获得代价函数 J(θ)针对Theta1 和 Theta2 的偏导数 ,公式和代码如下: 复制代码 % Obtain the (unregularized) gradient for the neural network cost function Theta2_grad = 1/m * D2; Theta1_grad = 1/m * D1; 复制代码 2.4 梯度校验 梯度校验的原理: 如果梯度计算正确,则下面两个值的差应该比较小 在 computeNumericalGradient.m 中,已经实现了梯度校验的过程,它会生成一个小型神经网络和数据集 来进行校验。如果梯度计算正确,会得到一个小于 e-9 的差值。 在真正开始模型学习时,需要关闭梯度校验。 2.5 正则化神经网络 上面计算出的偏导数没有加入正则项, 加入正则项的公式如下 ( j = 0 不参与正则化,即将θ的第一列置为0) 复制代码 %----------------------------PART 3---------------------------------- %---Regularize gradients temp1 = Theta1; temp2 = Theta2; temp1(:,1) = 0;   % set first column to 0 temp2(:,1) = 0;   % set first column to 0 Theta1_grad = Theta1_grad + lambda/m * temp1; Theta2_grad = Theta2_grad + lambda/m * temp2; 复制代码 ex4.m 中的调用: 复制代码 %% =============== Part 8: Implement Regularization =============== % Check gradients by running checkNNGradients lambda = 3; checkNNGradients(lambda); % Also output the costFunction debugging values debug_J = nnCostFunction(nn_params, input_layer_size, ... hidden_layer_size, num_labels, X, y, lambda); 复制代码 2.6 使用 fmincg 函数训练参数 ex4.m 中的调用如下: 复制代码 %% =================== Part 8: Training NN =================== options = optimset('MaxIter', 50); % You should also try different values of lambda lambda = 1; % Create "short hand" for the cost function to be minimized costFunction = @(p) nnCostFunction(p, ... input_layer_size, ... hidden_layer_size, ... num_labels, X, y, lambda); % Now, costFunction is a function that takes in only one argument (the % neural network parameters) [nn_params, cost] = fmincg(costFunction, initial_nn_params, options);% Obtain Theta1 and Theta2 back from nn_params Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ... hidden_layer_size, (input_layer_size + 1)); Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ... num_labels, (hidden_layer_size + 1)); 复制代码 3 可视化hidden layer 如果我们将 Theta1 中的一行拿出来,去掉了第一个 bias term,得到一个 400 维的向量。可视化hidden单元的一种方法,就是将这个 400 维向量重新整形为 20×20 图像,并显示它。 ex4.m 中的调用如下: 复制代码 %% ================= Part 9: Visualize Weights ================= displayData(Theta1(:, 2:end)); % 去掉第一列 复制代码 图像如下,Theta1 的每一行对应一个小格子: 4 预测 预测准确率为 94.34%。 我们引入正则化的作用是避免过拟合,如果将2.6中的 λ 设置为 0 或一个小数值,或者通过调整MaxIter,甚至可能得到一个准确率为100%的模型。但这种模型对于预测新来的数据,表现可能很差。 ex4.m 中的调用如下 复制代码 %% ================= Part 10: Implement Predict ================= pred = predict(Theta1, Theta2, X); 复制代码 5 运行结果 运行ex4.m 得到的结果如下: 复制代码 Loading and Visualizing Data ... Program paused. Press enter to continue. Loading Saved Neural Network Parameters ... Feedforward Using Neural Network ... Cost at parameters (loaded from ex4weights): 0.287629 (this value should be about 0.287629) Program paused. Press enter to continue. Checking Cost Function (w/ Regularization) ... Cost at parameters (loaded from ex4weights): 0.383770 (this value should be about 0.383770) Program paused. Press enter to continue. Evaluating sigmoid gradient... Sigmoid gradient evaluated at [1 -0.5 0 0.5 1]: 0.196612 0.235004 0.250000 0.235004 0.196612 Program paused. Press enter to continue. Initializing Neural Network Parameters ... Checking Backpropagation... -0.0093 -0.0093 0.0089 0.0089 -0.0084 -0.0084 0.0076 0.0076 -0.0067 -0.0067 -0.0000 -0.0000 0.0000 0.0000 -0.0000 -0.0000 0.0000 0.0000 -0.0000 -0.0000 -0.0002 -0.0002 0.0002 0.0002 -0.0003 -0.0003 0.0003 0.0003 -0.0004 -0.0004 -0.0001 -0.0001 0.0001 0.0001 -0.0001 -0.0001 0.0002 0.0002 -0.0002 -0.0002 0.3145 0.3145 0.1111 0.1111 0.0974 0.0974 0.1641 0.1641 0.0576 0.0576 0.0505 0.0505 0.1646 0.1646 0.0578 0.0578 0.0508 0.0508 0.1583 0.1583 0.0559 0.0559 0.0492 0.0492 0.1511 0.1511 0.0537 0.0537 0.0471 0.0471 0.1496 0.1496 0.0532 0.0532 0.0466 0.0466 The above two columns you get should be very similar. (Left-Your Numerical Gradient, Right-Analytical Gradient) If your backpropagation implementation is correct, then the relative difference will be small (less than 1e-9). Relative Difference: 2.2366e-11 Program paused. Press enter to continue. Checking Backpropagation (w/ Regularization) ... -0.0093 -0.0093 0.0089 0.0089 -0.0084 -0.0084 0.0076 0.0076 -0.0067 -0.0067 -0.0168 -0.0168 0.0394 0.0394 0.0593 0.0593 0.0248 0.0248 -0.0327 -0.0327 -0.0602 -0.0602 -0.0320 -0.0320 0.0249 0.0249 0.0598 0.0598 0.0386 0.0386 -0.0174 -0.0174 -0.0576 -0.0576 -0.0452 -0.0452 0.0091 0.0091 0.0546 0.0546 0.3145 0.3145 0.1111 0.1111 0.0974 0.0974 0.1187 0.1187 0.0000 0.0000 0.0337 0.0337 0.2040 0.2040 0.1171 0.1171 0.0755 0.0755 0.1257 0.1257 -0.0041 -0.0041 0.0170 0.0170 0.1763 0.1763 0.1131 0.1131 0.0862 0.0862 0.1323 0.1323 -0.0045 -0.0045 0.0015 0.0015 The above two columns you get should be very similar. (Left-Your Numerical Gradient, Right-Analytical Gradient) If your backpropagation implementation is correct, then the relative difference will be small (less than 1e-9). Relative Difference: 2.17629e-11 Cost a
50000+
5万行代码练就真实本领
17年
创办于2008年老牌培训机构
1000+
合作企业
98%
就业率

联系我们

电话咨询

0532-85025005

扫码添加微信