In U-Net the encoder block comprises two components: 2D convolution and max-pooling operations. The 2D convolution is a technique in image processing that generates a feature map. While max-pooling can reduce the size/shape of the feature map. The 2D convolution equation is as follows: \begin{equation} \mathbf{F}(m, n) = \sum_j\sum_k \mathbf{K}(i, j)\mathbf{I}(m-i)(n-j) \end{equation} Where 𝑲 is the kernel or filter, pass it over the image 𝑰. The indexes of rows and columns of the results feature map are marked with m and n respectively. In Figure 1, we show what are the operations in the encoder block, including 2D convolution, max pooling and related technologies. Figure 1. Operations in the encoder block In Figure 1, we add a layer with the number “0” called padding, which is used to maintain the size of the original input. Then the filter K shaped as a 3 by 3 matrix performs an element-wise multiplication with the input image 𝑰. The values are summed to generate the featur
Comments
Post a Comment