本文共 4767 字,大约阅读时间需要 15 分钟。
在卷积神经网络中,一个卷积层的输出形状受它的 输入形状、卷积核大小、补零(zero padding)以及步长(strides) 这四个因素的影响。全连接网络没有这个限制,它的输出形状独立于输入形状,而这也几乎是卷积神经网络中最令人怯步的地方了。
卷积操作在4D张量上,例如
卷积神经网络中的卷积不同于信号中的卷积,这里只是一个形象的比喻,实际的数学运算是元素点乘后累加。
def corr2d(X, K): h, w = K.shape Y = nd.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1)) for i in range(Y.shape[0]): for j in range(Y.shape[1]): Y[i, j] = (X[i: i + h, j: j + w] * K).sum() return Y
padding:
strides:
For any i i and , and for s=1 s = 1 and p=0 p = 0 ,
Theano code:
output = theano.tensor.nnet.conv2d( input, filters, input_shape=(b, c2, i1, i2), filter_shape=(c1, c2, k1, k2), border_mode=(0, 0), subsample=(1, 1))# output.shape[2] == (i1 - k1) + 1# output.shape[3] == (i2 - k2) + 1
For any i,k i , k and p p , and for ,
Theano code:
output = theano.tensor.nnet.conv2d( input, filters, input_shape=(b, c2, i1, i2), filter_shape=(c1, c2, k1, k2), border_mode=(p1, p2), subsample=(1, 1))# output.shape[2] == (i1 - k1) + 2 * p1 + 1# output.shape[3] == (i2 - k2) + 2 * p2 + 1
For any i i and for odd ( k=2n+1,n∈N k = 2 n + 1 , n ∈ N ), s=1 s = 1 and p=⌊k/2⌋=n p = ⌊ k / 2 ⌋ = n ,
Theano code:
output = theano.tensor.nnet.conv2d( input, filters, input_shape=(b, c2, i1, i2), filter_shape=(c1, c2, k1, k2), border_mode='half', subsample=(1, 1))# output.shape[2] == i1# output.shape[3] == i2
For any i i and , and for p=k−1 p = k − 1 and s=1 s = 1 ,
Theano code:
output = theano.tensor.nnet.conv2d( input, filters, input_shape=(b, c2, i1, i2), filter_shape=(c1, c2, k1, k2), border_mode='full', subsample=(1, 1))# output.shape[2] == i1 + (k1 - 1)# output.shape[3] == i2 + (k2 - 1)
For any i,k i , k and s s , and for ,
Theano code:
output = theano.tensor.nnet.conv2d( input, filters, input_shape=(b, c2, i1, i2), filter_shape=(c1, c2, k1, k2), border_mode=(0, 0), subsample=(s1, s2))# output.shape[2] == (i1 - k1) // s1 + 1# output.shape[3] == (i2 - k2) // s2 + 1
最后,最通用的形式!
For any i,k,p i , k , p and s s ,
Theano code:
output = theano.tensor.nnet.conv2d( input, filters, input_shape=(b, c2, i1, i2), filter_shape=(c1, c2, k1, k2), border_mode=(p1, p2), subsample=(s1, s2))# output.shape[2] == (i1 - k1 + 2 * p1) // s1 + 1# output.shape[3] == (i2 - k2 + 2 * p2) // s2 + 1
Transposed convolutions – also called fractionally strided convolutions – work by swapping the forward and backward passes of a convolution. One way to put it is to note that the kernel defines a convolution, but whether it’s a direct convolution or a transposed convolution is determined by how the forward and backward passes are computed.
The transposed convolution operation can be thought of as the gradient of some convolution with respect to its input, which is usually how transposed convolutions are implemented in practice.
A convolution described by k,s k , s and p p has an associated transposed convolution described by , i~′ i ~ ′ , k′=k k ′ = k , s′=1 s ′ = 1 and p′=k−p−1 p ′ = k − p − 1 , where i~′ i ~ ′ is the size of the stretched input obtained by adding s−1 s − 1 zeros between each input unit, and a=(i+2p−k)mods a = ( i + 2 p − k ) mod s represents the number of zeros added to the top and right edges of the input, and its output size is
Theano code:
o_prime1 = s1 * (output.shape[2] - 1) + a1 + k1 - 2 * p1o_prime2 = s2 * (output.shape[3] - 1) + a2 + k2 - 2 * p2input = theano.tensor.nnet.abstract_conv.conv2d_grad_wrt_inputs( output, filters, input_shape=(b, c1, o_prime1, o_prime2), filter_shape=(c1, c2, k1, k2), border_mode=(p1, p2), subsample=(s1, s2))
其他示例:
tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
nn.Conv2D(channels=96, kernel_size=11, strides=4, padding=1, activation='relu'),