博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
卷积神经网络中的算术问题(Convolution arithmetic)
阅读量:4069 次
发布时间:2019-05-25

本文共 4767 字,大约阅读时间需要 15 分钟。

在卷积神经网络中,一个卷积层的输出形状受它的 输入形状、卷积核大小、补零(zero padding)以及步长(strides) 这四个因素的影响。全连接网络没有这个限制,它的输出形状独立于输入形状,而这也几乎是卷积神经网络中最令人怯步的地方了。

卷积操作在4D张量上,例如

  • Theanofilter_shape(output_channels, ++input_channels, filter_rows, filter_columns++);input_shape:(batch size, ++input_channels, input rows, input columns++). MXNet同Theano.
  • TensorFlowfilter_shape(++filter_rows, filter_columns, input_channels++,output_channels);input_shape:(batch size, ++input rows, input columns, input_channels++).
  • 这个不同也体现在了keras中(以Theano和TensorFlow为后台):~/.keras/keras.json(”image_dim_ordering”;”backend”)

卷积神经网络中的卷积不同于信号中的卷积,这里只是一个形象的比喻,实际的数学运算是元素点乘后累加。

这里写图片描述

def corr2d(X, K):      h, w = K.shape    Y = nd.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1))    for i in range(Y.shape[0]):        for j in range(Y.shape[1]):            Y[i, j] = (X[i: i + h, j: j + w] * K).sum()    return Y

Convolution arithmetic

padding:

这里写图片描述

strides:

这里写图片描述


No zero padding, unit strides

这里写图片描述

For any i i and

k
, and for s=1 s = 1 and p=0 p = 0 ,

o=(ik)+1. o = ( i − k ) + 1.

Theano code:

output = theano.tensor.nnet.conv2d(    input, filters, input_shape=(b, c2, i1, i2), filter_shape=(c1, c2, k1, k2),    border_mode=(0, 0), subsample=(1, 1))# output.shape[2] == (i1 - k1) + 1# output.shape[3] == (i2 - k2) + 1

Zero padding, unit strides

For any i,k i , k and p p , and for

s
=
1
,

o=(ik)+2p+1. o = ( i − k ) + 2 p + 1.

Theano code:

output = theano.tensor.nnet.conv2d(    input, filters, input_shape=(b, c2, i1, i2), filter_shape=(c1, c2, k1, k2),    border_mode=(p1, p2), subsample=(1, 1))# output.shape[2] == (i1 - k1) + 2 * p1 + 1# output.shape[3] == (i2 - k2) + 2 * p2 + 1

Half (same) padding

For any i i and for

k
odd ( k=2n+1,nN k = 2 n + 1 , n ∈ N ), s=1 s = 1 and p=k/2=n p = ⌊ k / 2 ⌋ = n ,

o=i+2k/2(k1)=i+2n2n=i o = i + 2 ⌊ k / 2 ⌋ − ( k − 1 ) = i + 2 n − 2 n = i

Theano code:

output = theano.tensor.nnet.conv2d(    input, filters, input_shape=(b, c2, i1, i2), filter_shape=(c1, c2, k1, k2),    border_mode='half', subsample=(1, 1))# output.shape[2] == i1# output.shape[3] == i2

Full padding

For any i i and

k
, and for p=k1 p = k − 1 and s=1 s = 1 ,

o=i+2(k1)(k1)=i+(k1). o = i + 2 ( k − 1 ) − ( k − 1 ) = i + ( k − 1 ) .

Theano code:

output = theano.tensor.nnet.conv2d(    input, filters, input_shape=(b, c2, i1, i2), filter_shape=(c1, c2, k1, k2),    border_mode='full', subsample=(1, 1))# output.shape[2] == i1 + (k1 - 1)# output.shape[3] == i2 + (k2 - 1)

No zero padding, non-unit strides

For any i,k i , k and s s , and for

p
=
0
,

o=iks+1. o = ⌊ i − k s ⌋ + 1.

Theano code:

output = theano.tensor.nnet.conv2d(    input, filters, input_shape=(b, c2, i1, i2), filter_shape=(c1, c2, k1, k2),    border_mode=(0, 0), subsample=(s1, s2))# output.shape[2] == (i1 - k1) // s1 + 1# output.shape[3] == (i2 - k2) // s2 + 1

Zero padding, non-unit strides

最后,最通用的形式!

这里写图片描述

For any i,k,p i , k , p and s s ,

o
=
i
+
2
p
k
s
+
1.

Theano code:

output = theano.tensor.nnet.conv2d(    input, filters, input_shape=(b, c2, i1, i2), filter_shape=(c1, c2, k1, k2),    border_mode=(p1, p2), subsample=(s1, s2))# output.shape[2] == (i1 - k1 + 2 * p1) // s1 + 1# output.shape[3] == (i2 - k2 + 2 * p2) // s2 + 1

Transposed Convolution arithmetic

Transposed convolutions – also called fractionally strided convolutions – work by swapping the forward and backward passes of a convolution. One way to put it is to note that the kernel defines a convolution, but whether it’s a direct convolution or a transposed convolution is determined by how the forward and backward passes are computed.

The transposed convolution operation can be thought of as the gradient of some convolution with respect to its input, which is usually how transposed convolutions are implemented in practice.

Zero padding, non-unit strides, transposed

A convolution described by k,s k , s and p p has an associated transposed convolution described by

a
, i~ i ~ ′ , k=k k ′ = k , s=1 s ′ = 1 and p=kp1 p ′ = k − p − 1 , where i~ i ~ ′ is the size of the stretched input obtained by adding s1 s − 1 zeros between each input unit, and a=(i+2pk)mods a = ( i + 2 p − k ) mod s represents the number of zeros added to the top and right edges of the input, and its output size is

o=s(i1)+a+k2p. o ′ = s ( i ′ − 1 ) + a + k − 2 p .

Theano code:

o_prime1 = s1 * (output.shape[2] - 1) + a1 + k1 - 2 * p1o_prime2 = s2 * (output.shape[3] - 1) + a2 + k2 - 2 * p2input = theano.tensor.nnet.abstract_conv.conv2d_grad_wrt_inputs(    output, filters, input_shape=(b, c1, o_prime1, o_prime2),    filter_shape=(c1, c2, k1, k2), border_mode=(p1, p2),    subsample=(s1, s2))

其他示例

  • TensorFlow
tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
  • MXNet-Gluon
nn.Conv2D(channels=96, kernel_size=11, strides=4, padding=1, activation='relu'),

你可能感兴趣的文章
UIView的使用setNeedsDisplay
查看>>
归档与解归档
查看>>
Window
查看>>
为什么button在设置标题时要用一个方法,而不像lable一样直接用一个属性
查看>>
字符串的截取
查看>>
2. Add Two Numbers
查看>>
17. Letter Combinations of a Phone Number (DFS, String)
查看>>
93. Restore IP Addresses (DFS, String)
查看>>
19. Remove Nth Node From End of List (双指针)
查看>>
49. Group Anagrams (String, Map)
查看>>
139. Word Break (DP)
查看>>
23. Merge k Sorted Lists (Divide and conquer, Linked List) 以及java匿名内部类
查看>>
Tensorflow入门资料
查看>>
剑指_用两个栈实现队列
查看>>
剑指_顺时针打印矩阵
查看>>
剑指_栈的压入弹出序列
查看>>
剑指_复杂链表的复制
查看>>
服务器普通用户(非管理员账户)在自己目录下安装TensorFlow
查看>>
星环后台研发实习面经
查看>>
大数相乘不能用自带大数类型
查看>>