% Small choose
$$
Apply a 2D Gaussian filter to an image to smooth it. The filter is defined as:
from scipy import ndimage, datasets
import numpy as np
# Define the horizontal and vertical edge detection filters
k_v = np.array([[-1, 0, 1]])
k_h = np.array([[-1], [0], [1]])
orig = datasets.ascent().astype('int32')
# Apply the filters to the image
edge_h = ndimage.convolve(orig, k_h)
edge_v = ndimage.convolve(orig, k_v)
# Compute the magnitude of the gradient
magnitude = np.sqrt(edge_h**2 + edge_v**2)
magnitude *= 255.0 / np.max(magnitude)
from scipy import ndimage, datasets
import numpy as np
ascent = datasets.ascent().astype('int32')
sobel_h = ndimage.sobel(ascent, 0) # horizontal gradient
sobel_v = ndimage.sobel(ascent, 1) # vertical gradient
magnitude = np.sqrt(sobel_h**2 + sobel_v**2)
magnitude *= 255.0 / np.max(magnitude) # normalization
There are several reasons to used convolution operations in neural networks:
In PyTorch, the convolutional layer is defined as torch.nn.Conv2d
and it requires the following parameters:
in_channels
: number of input channels (e.g., 3 for RGB images).out_channels
: number of output channels (i.e., number of filters).kernel_size
: size of the convolutional kernel (e.g., 3 for a \(3 \times 3\) kernel).stride
: step size for the kernel.padding
: number of zeros to add around the input image.bias
: whether to include a bias term.dilation
: spacing between kernel elements (\(\text{default} = 1\) means no dilation).import torch
import torch.nn as nn
m = nn.Conv2d(in_channels=3, out_channels=5, kernel_size=3, stride=1, padding=1)
print("Size of the weight parameter", m.weight.shape)
print("Size of the bias parameter", m.bias.shape)
Size of the weight parameter torch.Size([5, 3, 3, 3])
Size of the bias parameter torch.Size([5])
The number of parameters of a convolutional layer is: \[ \text{\# parameters} = \text{in\_channels} \times \text{out\_channels} \times \text{kernel\_size}^2 + \text{out\_channels}. \]
m = nn.MaxPool2d(kernel_size = 2, stride = 1)
input = torch.randn(1, 4, 4)
print(input)
output = m(input)
print(output)
tensor([[[-0.2066, 0.7505, -0.6103, 0.4397],
[-0.2792, 0.8013, 0.7298, 0.8730],
[ 0.2436, -0.2776, 1.1351, 0.2402],
[-0.1745, 0.2285, 0.3640, -0.1111]]])
tensor([[[0.8013, 0.8013, 0.8730],
[0.8013, 1.1351, 1.1351],
[0.2436, 1.1351, 1.1351]]])
In PyTorch, adaptive pooling (AdaptiveAvgPool
or AdaptiveMaxPool
) allows you to specify the output size instead of the kernel size.
AdaptiveAvgPool1d
tensor([[1.0000, 1.0000, 1.5000, 2.0000, 2.0000, 2.5000, 3.0000, 3.0000]])
AdaptiveMaxPool1d
tensor([[1., 1., 1., 2., 2., 2., 3., 3., 3., 3.]])
The output size for a convolution/pooling layer is: \[ L_{\text{out}} = \left\lfloor \frac{L_{\text{in}} + 2\times\text{padding} - \text{dilation} \times (\text{kernel size} - 1) - 1}{\text{stride}} + 1\right\rfloor. \]
If \(L_{\text{out}} \leq L_{\text{in}}\), define
If \(L_{\text{out}} > L_{\text{in}}\), upsample the input by the factor \(\left\lceil \frac{L_{\text{out}}}{L_{\text{in}}}\right\rceil\), e.g., if \(L_{\text{in}} = 3\) and \(L_{\text{out}} = 5\), upsample the input by a factor of \(2\): \[ [1,2,3] \Rightarrow [1,1,2,2,3,3] \] and then apply the previous case.
A typical CNN architecture consists of two parts:
class MyConvNet(nn.Module):
def __init__(self):
super().__init__()
self.feature = nn.Sequential(
nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.AdaptiveMaxPool2d(output_size=(4, 4)),
nn.Flatten()
)
self.classifier = nn.Sequential(
nn.Linear(32 * 4 * 4, 128),
nn.ReLU(),
nn.Linear(128, 10)
)
def forward(self, x):
x = self.feature(x)
x = self.classifier(x)
return x
AlexNet(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
(1): ReLU(inplace=True)
(2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(4): ReLU(inplace=True)
(5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU(inplace=True)
(8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): ReLU(inplace=True)
(10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
(classifier): Sequential(
(0): Dropout(p=0.5, inplace=False)
(1): Linear(in_features=9216, out_features=4096, bias=True)
(2): ReLU(inplace=True)
(3): Dropout(p=0.5, inplace=False)
(4): Linear(in_features=4096, out_features=4096, bias=True)
(5): ReLU(inplace=True)
(6): Linear(in_features=4096, out_features=1000, bias=True)
)
)