博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Deep Learning 学习笔记(一)——softmax Regression
阅读量:7037 次
发布时间:2019-06-28

本文共 15202 字,大约阅读时间需要 50 分钟。

  茫然中不知道该做什么,更看不到希望。

  偶然看到coursera上有Andrew Ng教授的机器学习课程以及他UFLDL上的深度学习课程,于是静下心来,视频一个个的看,作业一个一个的做,程序一个一个的写。N多数学的不懂、Matlab不熟悉,开始的时候学习进度慢如蜗牛,坚持了几个月,终于也学完了。为了避免遗忘,在这里记下一些内容。由于水平有限,Python也不是太熟悉,英语也不够好,有错误或不当的地方,请不吝赐教。

 

  对于softmax背后的理论还不是很清楚,不知道是来自信息论还是概率。不过先了解个大概,先用起来,背后的理论再慢慢补充。

  softmax的基本理论:

  对于给定的输入x和输出有,K类分类器每类的概率为P(y=k|x,Θ),即

  

  模型参数 θ(1),θ(2),,θ(K)Rn ,矩阵θ以K*n的形式比较方便(其中n为输入x的维度或特征数)。

  softmax回归的代价函数:

  

  其中1{y(i)=k}为指示函数,即y(i)为k时其值为1,否则为0,或则说括号内的表达式为真时其值为1,否则为0

  

  梯度公式:

  

  

  在实现此模型时遇到了2个问题,卡了一段时间:

    1. 指示函数如何实现?我的实现方法:把y转换为一个k个元素的向量yv,如果y=i,则yv[i]=1,其他位置为零。在代价函数中用这个向量和概率P元素相乘、在梯度公式中与概率P相减即可实现指示函数。

    2. 对矩阵不太熟练,矢量化花费了不少时间。

  

  对概率P的参数θ进行平移得到结果和原概率一致,因此可得到参数θ是有冗余的结论。解决方法有2种,第一种是在代价函数和梯度中加上L2范式惩罚项,这种方式又增加了一个自由参数:惩罚项系数。第二种方式是固定某个类的参数为零,这样的方式不影响最终的分类结果。在我的实现方式里使用第二种方式。

  教程里提到的梯度检验的方法非常有效,可以有效验证代价函数和梯度实现是否正确。只要通过梯度检验,一般都能得到正确的结果。

  UFLDL教程上的练习是Matlab,由于对Matlab熟悉度不够,我使用Python+numpy+scipy来实现。代码的意义参考代码中注释。

  第一段代码是一个抽象的监督学习的模型类,可以用于神经网络等监督学习模型。

1 import numpy as np  2 from dp.common.optimize import minFuncSGD  3 import scipy.optimize as spopt  4   5 class SupervisedLearningModel(object):  6   7     def flatTheta(self):  8         '''  9         convert weight and intercept to 1-dim vector 10         ''' 11         pass 12      13     def rebuildTheta(self,theta): 14         ''' 15         overwrite the method in SupervisedLearningModel         16         convert 1-dim theta to weight and intercept 17         Parameters: 18             theta    - The vector hold the weights and intercept, needed by scipy.optimize function 19                        size:outputSize*inputSize 20         ''' 21              22     def cost(self, theta,X,y): 23         ''' 24         This method is used to some optimize function such as fmin_cg,fmin_l_bfgs_b in scipy.optimize 25         Parameters: 26             theta        - 1-Dim vector of weight 27             X            - samples, numFeatures by numSamples 28             y            - labels,  numSamples elements vector 29         return: 30             the model cost 31         ''' 32         pass 33      34     def gradient(self, theta,X,y): 35         ''' 36         This method is used to some optimize function such as fmin_cg,fmin_l_bfgs_b in scipy.optimize 37         Parameters: 38             theta        - 1-Dim vector of weight 39             X            - samples, numFeatures by numSamples 40             y            - labels,  numSamples elements vector 41         return: 42             the model gradient 43         '''         44         pass 45      46     def costFunc(self,theta,X,y): 47         ''' 48         This method is used to some optimize function such as minFuncSGD in this package 49         Parameters: 50             theta        - 1-Dim vector of weight 51             X            - samples, numFeatures by numSamples 52             y            - labels,  numSamples elements vector 53         return: 54             the model cost and gradient 55         '''      56         pass 57      58     def predict(self, Xtest): 59         ''' 60         predict the test samples 61         Parameters: 62             X            - test samples, numFeatures by numSamples  63         return: 64             the predict result,a vector, numSamples elements 65         ''' 66         pass 67      68     def performance(self,Xtest,ytest): 69         ''' 70         Before calling this method, this model should be training 71         Parameter: 72             Xtest    - The data to be predicted, numFeatures by numData 73         '''             74         pred = self.predict(Xtest)    75         return np.mean(pred == ytest) * 100         76  77     def train(self,X,y):  78         ''' 79         use this method to train the model. 80         Parameters: 81             theta        - 1-Dim vector of weight 82             X            - samples, numFeatures by numSamples 83             y            - labels,  numSamples elements vector         84         '''                       85         theta =self.flatTheta() 86          87         ret = spopt.fmin_l_bfgs_b(self.cost, theta, fprime=self.gradient,args=(X,y),m=200,disp=1, maxiter=100) 88         opttheta=  ret[0]     89          90         ''' 91         opttheta = spopt.fmin_cg(self.cost, theta, fprime=self.gradient,args=(X,y),full_output=False,disp=True, maxiter=100)         92         ''' 93         ''' 94         options=dict() 95         options['epochs']=10 96         options['alpha'] = 2 97         options['minibatch']=256 98         opttheta = minFuncSGD(self.costFunc,theta,X,y,options) 99         100         '''101         self.rebuildTheta(opttheta)
View Code

  第二段代码定义了一个单一神经网络层NNLayer,从第一段代码中的SupervisedModel类继承下来。它在,softmax和多层神经网络中用得到。

1 class NNLayer(SupervisedLearningModel):  2     '''  3     This class is single layer of Neural network   4     '''  5     def __init__(self, inputSize,outputSize,Lambda,actFunc='sigmoid'):  6         '''  7         Constructor: initialize one layer w.r.t params  8         parameters :   9             inputSize         - the number of input elements 10             outputSize        - the number of output 11             lambda            - weight decay parameter 12             actFunc        - the can be sigmoid,tanh,rectified linear function 13         ''' 14         super().__init__() 15         self.inputSize = inputSize 16         self.outputSize = outputSize 17         self.Lambda = Lambda         18         self.actFunc=sigmoid 19         self.actFuncGradient=sigmodGradient 20          21         self.input=0            #input of this layer 22         self.activation=0       #output of the layer 23         self.delta=0            #the error of this layer         24         self.W=0                #the weight 25         self.b=0                #the intercept 26                  27         if actFunc=='sigmoid':   28             self.actFunc =  sigmoid 29             self.actFuncGradient = sigmodGradient         30         if actFunc=='tanh':             31             self.actFunc =  tanh 32             self.actFuncGradient =tanhGradient 33         if actFunc=='rectfiedLinear':             34             self.actFunc =  rectfiedLinear   35             self.actFuncGradient =  rectfiedLinearGradient 36  37         #epsilon的值是一个经验公式   38         #initialize weights and intercept (bias) 39         epsilon_init = 2.4495/np.sqrt(self.inputSize+self.outputSize)*0.001 40         theta = np.random.rand(self.outputSize, self.inputSize + 1) * 2 * epsilon_init - epsilon_init 41         self.rebuildTheta(theta) 42                          43     def flatTheta(self): 44         ''' 45         convert weight and intercept to 1-dim vector 46         ''' 47         W = np.hstack((self.W, self.b)) 48         return W.ravel()  49      50     def rebuildTheta(self,theta): 51         ''' 52         overwrite the method in SupervisedLearningModel         53         convert 1-dim theta to weight and intercept 54         Parameters: 55             theta    - The vector hold the weights and intercept, needed by scipy.optimize function 56                        size:outputSize*inputSize 57         ''' 58         W=theta.reshape(self.outputSize,-1) 59         self.b=W[:,-1].reshape(self.outputSize,1)   #bias b is a vector with outputSize elements 60         self.W = W[:,:-1]   61  62     def forward(self): 63         ''' 64         Parameters: 65             X -  The examples in a matrix,  66                 it's dimensionality is inputSize by numSamples 67         '''   68         Z = np.dot(self.W,self.input)+self.b     #Z         69         self.activation= self.actFunc(Z)             #activations 70         return self.activation 71      72     def backpropagate(self): 73         ''' 74         parameter: 75             inputMat - the actviations of previous layer, or input of this layer, 76                          inputSize by numSamples 77             delta - the next layer error term, outputSize by numSamples 78          79         assume current layer number is l, 80         delta is the error term of layer l+1. 81         delta(l) = (W(l).T*delta(l+1)).f'(z) 82         If this layer is the first hidden layer,this method should not 83         be called 84         The f' is re-writed to void the second call to the activation function 85         ''' 86         return np.dot(self.W.T,self.delta)*self.actFuncGradient(self.input) 87      88     def layerGradient(self): 89         ''' 90         grad_W(l)=delta(l+1)*input.T 91         grad_b(l) = SIGMA(delta(l+1)) 92         parameters: 93             inputMat - input of this layer, inputSize by numSamples 94             delta    - the next layer error term 95         ''' 96         m=self.input.shape[1] 97         gw = np.dot(self.delta,self.input.T)/m 98         gb = np.sum(self.delta,1)/m 99         #combine gradients of weights and intercepts100         #and flat it101         grad = np.hstack((gw, gb.reshape(-1,1)))102          103         return grad104         105     106 def sigmoid(Z):107     return 1.0 /(1.0 + np.exp(-Z))108 109 def sigmodGradient (a):110     #a = sigmoid(Z)111     return a*(1-a)112 113 def tanh(Z):114     e1=np.exp(Z)115     e2=np.exp(-Z)116     return (e1-e2)/(e1+e2)117 118 def tanhGradient(a):119     return 1-a**2120 121 def rectfiedLinear(Z):122     a = np.zeros(Z.shape)+Z123     a[a<0]=0124     return a125 126 def rectfiedLinearGradient(a):127     b = np.zeros(a.shape)+a    128     b[b>0]=1129     return b
View Code

  第三段代码是softmax回归的实现,它从NNLayer继承。

1 import numpy as np  2 #import scipy.optimize as spopt  3 from dp.supervised import NNBase  4 from time import time  5 #from dp.common.optimize import minFuncSGD  6 class SoftmaxRegression(NNBase.NNLayer):  7     '''  8     We assume the last class weight to be zeros in this implementation.  9     The weight decay is not used here. 10  11     ''' 12     def __init__(self, numFeatures, numClasses,Lambda=0): 13         ''' 14         Initialization of weights,intercepts and other members  15         Parameters: 16             numClasses    - The number of classes to be classified 17             X             - The training samples, numFeatures by numSamples 18             y             - The labels of training samples, numSamples elements vector 19         '''       20  21         # call the super constructor to initialize the weights and intercepts 22         # We do not need the last weights and intercepts of the last class 23         super().__init__(numFeatures, numClasses - 1, Lambda, None) 24          25         #self.X=0         26         self.y_mat=0   27             28     def predict(self, Xtest): 29         ''' 30         Prediction. 31         Before calling this method, this model should be training 32         Parameter: 33             Xtest    - The data to be predicted, numFeatures by numData 34         ''' 35         Z = np.dot(self.W, Xtest) + self.b 36         #add the prediction of the last class,they are all zeros 37         lastClass = np.zeros((1, Xtest.shape[1])) 38         Z = np.vstack((Z, lastClass)) 39         #get the index of max value in each column, it is the prediction 40         return np.argmax(Z, 0)        41         42     def forward(self): 43         ''' 44         get the matrix of softmax hypothesis 45         this method  will be called by cost and gradient methods 46         Parameters: 47              48         ''' 49         h = np.dot(self.W, self.input) + self.b 50         h = np.exp(h) 51         #add probabilities of the last class, they are all ones  52         h = np.vstack((h, np.ones((1, self.input.shape[1])))) 53         #The probability of all classes 54         hsum = np.sum(h, axis=0) 55         #get the probability of each class 56         self.activation = h / hsum 57         #delta = -(self.y_mat-h) 58         self.delta = self.activation - self.y_mat 59         self.delta=self.delta[:-1, :] 60          61         return self.activation 62  63     def setTrainingLabels(self,y): 64         # convert Vector y to a matrix y_mat. 65         # For sample i, if it belongs to the k-th class,  66         # y_mat[k,i]=1 (k==j), y_mat[k,i]=0 (k!=j)         67         y = y.astype(np.int64) 68         m=y.shape[0] 69         yy = np.arange(m) 70         self.y_mat = np.zeros((self.outputSize+1, m))           71         self.y_mat[y, yy] = 1 72          73     def softmaxforward(self,theta,X,y): 74         self.input = X 75         self.setTrainingLabels(y) 76         self.rebuildTheta(theta) 77         return self.forward() 78  79     def cost(self, theta,X,y): 80         ''' 81         The cost function. 82         Parameters: 83             theta    - The vector hold the weights and intercept, needed by scipy.optimize function 84                        size: (numClasses - 1)*(numFeatures + 1)         85         ''' 86         h = np.log(self.softmaxforward(theta,X,y)) 87         #h * self.y_mat, apply the indicator function 88         cost = -np.sum(h *self.y_mat, axis=(0, 1)) 89          90         return cost / X.shape[1] 91      92     def gradient(self, theta,X,y): 93         ''' 94         The gradient function. 95         Parameters: 96             theta    - The vector hold the weights and intercept, needed by scipy.optimize function 97                        size: (numClasses - 1)*(numFeatures + 1)         98         ''' 99         self.softmaxforward(theta,X,y)        100 101         #get the gradient102         grad = super().layerGradient()103                104         return grad.ravel()105     106     def costFunc(self,theta,X,y):107  108         grad=self.gradient(theta, X, y)109         h=np.log(self.activation)110         cost = -np.sum(h * self.y_mat, axis=(0, 1))/X.shape[1]111         return cost,grad    112 113 114 def checkGradient(X,y):115         116     sm = SoftmaxRegression(X.shape[0], 10)117     #W = np.hstack((sm.W, sm.b))118     #sm.setTrainData(X, y)119     theta = sm.flatTheta()    120     #grad = sm.gradient(theta,X, y)121     cost,grad=sm.costFunc(theta, X, y)   122     numgrad = np.zeros(grad.shape)123     124     e = 1e-6125     126     for i in range(np.size(grad)):         127         theta[i]=theta[i]-e128         loss1,g1 =sm.costFunc(theta,X, y)129         theta[i]=theta[i]+2*e130         loss2,g2 = sm.costFunc(theta,X, y)131         theta[i]=theta[i]-e            132         133         numgrad[i] = (-loss1 + loss2) / (2 * e)134         135     print(np.sum(np.abs(grad-numgrad))/np.size(grad))
View Code

 

测试数据使用MNIST数据集。测试结果,正确率在92.5%左右。

测试代码:

 

1     X = np.load('../../common/trainImages.npy') / 255 3      2   y = np.load('../../common/trainLabels.npy') 4     '''     5     X1=X[:,:10] 6     y1=y[:10] 7     checkGradient(X1,y1) 8     '''     9     Xtest = np.load('../../common/testImages.npy') / 25511     10   ytest = np.load('../../common/testLabels.npy')12     sm = SoftmaxRegression(X.shape[0], 10)13     t0=time()14     sm.train(X,y)15     print('training Time %.5f s' %(time()-t0))16 17     print('test acc :%.3f%%' % (sm.performance(Xtest,ytest)))

 

 

 

参考资料:

 

Softmax Regression 

posted on
2015-02-04 15:39 阅读(
...) 评论(
...)

转载于:https://www.cnblogs.com/arsenicer/p/4272522.html

你可能感兴趣的文章
[家里蹲大学数学杂志]第055期图像滤波中的方向扩散模型
查看>>
高速排序算法
查看>>
java 获取 path
查看>>
小盆友给谷歌写封信 老爸获一周假期
查看>>
Ubuntu安装配置Qt环境
查看>>
LBS 与 GPS 定位之间的区别
查看>>
Android调用系统的Activity、ContentProvider、Service、Broadcast Receiver
查看>>
对象池模式
查看>>
Android学习笔记(四十):Preference的使用
查看>>
ByteArrary(优化数据存储和数据流)
查看>>
围住神经猫,朋友圈瞬间爆红是如何炼成的?
查看>>
HDUoj-------(1128)Self Numbers
查看>>
huffman编码——原理与实现
查看>>
php curl获取网页内容乱码和获取不到内容的解决方法
查看>>
28、activity之间传递数据&批量传递数据
查看>>
混沌数学之Rössler(若斯叻)吸引子
查看>>
【JavaScript】关于prototype
查看>>
普通Jquery的ajax判断重复和formvalidator的ajaxValidator区别
查看>>
ovs处理openflow消息的流程
查看>>
精品素材:WALK & RIDE 单页网站模板下载
查看>>