TensorFlow
(2015): Developed by Google Brain Team.PyTorch
(2016): Developed by Facebook AI Research Lab.JAX
(2018): Developed by Google Brain Team.MXNet
(2015): Developed by Apache Software Foundation.Keras
(2015): A high-level API that can run on top of JAX, TensorFlow, or PyTorch.torch.tensor()
function.shape
attribute is used to get the shape of a tensorrequired_grad
attribute is used to track the gradient of the tensor.
required_grad=True
, the gradient of the tensor will be computed during the backpropagation.torch.tensor.numpy()
torch.tensor.from_numpy()
requires_grad=True
, it signals to autograd that every operation on them should be tracked..backward()
on the final tensor to compute the gradient.x = torch.tensor([3.0], requires_grad=True)
y = x**2
y.backward()
print(x.grad.item()) # dy/dx = 2x and when x = 3, dy/dx = 6
6.0
x = torch.tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True)
y = torch.trace(torch.matmul(x.t(), x))
y.backward()
print(x.grad)
tensor([[2., 4.],
[6., 8.]])
\[ y = \operatorname{tr}(X^TX) \Rightarrow \frac{\partial y}{\partial X} = 2X \]
attributes
(name and age) and methods
(get_name).class student(person):
def __init__(self, name, age, major):
super().__init__(name, age) # call the parent class constructor
self.major = major
def get_major(self):
return self.major
john = student("John", 20, "Stat")
print(f"{john.get_name()} is majoring in {john.get_major()}.")
John is majoring in Stat.
In the data preparation stage, we need to do the following:
A dataset is stored in a Dataset
class. Inside the dataset calss, we can
To feed the dataset to our model, we need the DataLoader
class, which
You need to implement three methods in the Dataset
class:
import numpy as np
rng = np.random.default_rng(20241029)
n = 30
p = 2
# Generate x_1, x_2, ..., x_30 from a normal distribution
x = rng.normal(loc=0.0, scale=1.0, size=(n, p))
# Generate y from the linear model y = 2 - x_1 + 3*x_2
beta = np.array([-1, 3])
y = 2 + x.dot(beta).reshape(-1,1) + rng.normal(loc=0.0, scale=0.1, size=(n, 1))
training_data = MyDataset(x, y)
print("The number fo samples is", training_data.__len__())
print("The first sample is (x_1, x_2, y) =", training_data.__getitem__(0))
The number fo samples is 30
The first sample is (x_1, x_2, y) = (tensor([-0.7578, 1.2519]), tensor([6.6215]))
DataLoader
class will:
shuffle=True
)batch_size
is specified)DataLoader
is an iterable. After we iterate over all batches, the data will be shuffled again.torch.nn.Linear
is a fully connected layer.state_dict()
method or directly accessing the attributesOrderedDict([('weight', tensor([[ 0.5468, -0.0330],
[ 0.2978, 0.0922],
[-0.7043, -0.3674]])), ('bias', tensor([0.3729, 0.6167, 0.1815]))])
See https://pytorch.org/docs/stable/nn.html
torch.nn.Module
class is the base class for all neural network modules.torch.nn.Sequential
class is a subclass of torch.nn.Module
that is used to sequentially stack layers.torch.nn.Sequential
, for example, ResNet, recurrent networks, etc.torch.nn.Module
.A basic nn.Module
subclass is as follows:
class model(nn.Module):
def __init__(self):
super().__init__()
# Define the layers
self.linear1 = nn.Linear(input_dim, hidden_dim)
self.relu = nn.ReLU()
self.linear2 = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
# Define the forward pass, i.e., how to compute the output from the input
x = self.linear1(x)
x = self.relu(x)
y = self.linear2(x)
return y
__init__
method, we define the layers that will be used by the model.forward
method, we define how the output \(y\) is obtained from the input \(x\).The following code defines a residual block with three FC layers and ReLU activation function.
import torch.nn as nn
class MyResBlock(nn.Module):
def __init__(self, input_dim):
super().__init__()
self.model = nn.Sequential(
nn.Linear(input_dim, 2*input_dim),
nn.ReLU(),
nn.Linear(2*input_dim, 2*input_dim),
nn.ReLU(),
nn.Linear(2*input_dim, input_dim)
)
def forward(self, x):
y = x + self.model(x)
return y
We can use MyResBlock
to build a deep neural network.
torch.nn.MSELoss
: Mean Squared Errortorch.nn.CrossEntropyLoss
: Cross Entropytorch.nn.L1Loss
: L1 Losstorch.nn.PoissonNLLLoss
: Poisson Negative Log Likelihoodtorch.optim.SGD
: Stochastic Gradient Descenttorch.optim.Adam
: Adamzero_grad()
: Clear the gradient stored in the optimizerstep()
: Update the parametersHence a standard training loop looks like this:
def training_loop(dataloader, model, loss_fn, optimizer, n_epochs):
for epoch in range(n_epochs):
for x_batch, y_batch in dataloader:
# Compute prediction and loss
y_pred = model(x_batch)
loss = loss_fn(y_pred, y_batch)
# Backpropagation
loss.backward() # compute the gradient
optimizer.step() # update the parameters
optimizer.zero_grad() # clear the gradient stored in the optimizer
# print the training progress
if epoch % 10 == 0:
print(f"Epoch {epoch+1}, Loss = {loss.item():.3f}")
We now have all the components (data, model, loss, and optimizer) to train the model.
loss_fn = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.05)
training_loop(train_loader, model, loss_fn, optimizer, 100)
Epoch 1, Loss = 10.774
Epoch 11, Loss = 4.807
Epoch 21, Loss = 9.499
Epoch 31, Loss = 0.259
Epoch 41, Loss = 0.348
Epoch 51, Loss = 1.994
Epoch 61, Loss = 0.421
Epoch 71, Loss = 0.759
Epoch 81, Loss = 0.393
Epoch 91, Loss = 0.723
PyTorch
, we train a model using the following steps:
torch.nn.Module
torch.nn
or define your own)torch.optim
)TensorBoard is a visualization tool provided by TensorFlow, but it can also be used with PyTorch.
torch.utils.tensorboard.SummaryWriter
class to log the training process.add_scalar
method is used to log the scalar value.from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter('runs/example')
for epoch in range(100):
running_loss = 0.0
for x_batch, y_batch in train_loader:
# Compute prediction and loss
y_pred = model(x_batch)
loss = loss_fn(y_pred, y_batch)
# Backpropagation
loss.backward() # compute the gradient
optimizer.step() # update the parameters
optimizer.zero_grad() # clear the gradient stored in the optimizer
running_loss += loss.item()
avg_loss = running_loss / len(train_loader)
writer.add_scalar('Loss/Train', avg_loss, epoch + 1)
writer.flush()
Use add_graph
to visualize the network architecture.
state_dict
attribute.OrderedDict([('0.model.0.weight',
tensor([[ 0.3353, 0.5100],
[ 0.1076, 0.4300],
[-0.6342, 0.5900],
[-0.1826, -0.8827]])),
('0.model.0.bias', tensor([ 0.3769, 0.9032, 0.2106, -1.3865])),
('0.model.2.weight',
tensor([[ 0.0631, -0.3304, -0.0105, -0.1099],
[-0.4175, -1.1141, -0.6936, 0.2326],
[ 0.0374, -0.4600, 0.2077, -0.3252],
[-0.3309, 0.0550, 0.0105, -0.0252]])),
('0.model.2.bias', tensor([ 0.0490, -0.6529, 0.1407, -0.6110])),
('0.model.4.weight',
tensor([[-1.0155, 0.3032, -0.0163, 0.2224],
[-1.5355, -0.1701, -0.3817, 0.2588]])),
('0.model.4.bias', tensor([ 0.7159, -0.9517])),
('2.model.0.weight',
tensor([[ 0.0172, -0.0622],
[ 0.0012, 0.1577],
[-0.6201, -0.5586],
[ 0.1470, -0.0994]])),
('2.model.0.bias', tensor([-0.5300, -0.7162, -0.5571, -0.3369])),
('2.model.2.weight',
tensor([[-0.4374, -0.4943, -0.1508, -0.2580],
[-0.4353, -0.2645, 0.3280, -0.4524],
[-0.0281, -0.0727, -0.1621, -0.1945],
[ 0.1966, -0.0372, 0.2333, 0.1759]])),
('2.model.2.bias', tensor([-0.3115, -0.4114, -0.7311, -0.3686])),
('2.model.4.weight',
tensor([[-0.2567, -0.4981, 0.3086, -0.0100],
[ 0.4249, 0.1218, 0.2229, 0.1164]])),
('2.model.4.bias', tensor([ 0.3770, -0.8785])),
('4.weight', tensor([[-2.1846, -0.6424]])),
('4.bias', tensor([2.4523]))])
import torchvision.models as models
from torchsummary import summary
resnet = models.resnet18(weights=None) # No weights - random initialization
summary(resnet, (3, 200, 200))
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 100, 100] 9,408
BatchNorm2d-2 [-1, 64, 100, 100] 128
ReLU-3 [-1, 64, 100, 100] 0
MaxPool2d-4 [-1, 64, 50, 50] 0
Conv2d-5 [-1, 64, 50, 50] 36,864
BatchNorm2d-6 [-1, 64, 50, 50] 128
ReLU-7 [-1, 64, 50, 50] 0
Conv2d-8 [-1, 64, 50, 50] 36,864
BatchNorm2d-9 [-1, 64, 50, 50] 128
ReLU-10 [-1, 64, 50, 50] 0
BasicBlock-11 [-1, 64, 50, 50] 0
Conv2d-12 [-1, 64, 50, 50] 36,864
BatchNorm2d-13 [-1, 64, 50, 50] 128
ReLU-14 [-1, 64, 50, 50] 0
Conv2d-15 [-1, 64, 50, 50] 36,864
BatchNorm2d-16 [-1, 64, 50, 50] 128
ReLU-17 [-1, 64, 50, 50] 0
BasicBlock-18 [-1, 64, 50, 50] 0
Conv2d-19 [-1, 128, 25, 25] 73,728
BatchNorm2d-20 [-1, 128, 25, 25] 256
ReLU-21 [-1, 128, 25, 25] 0
Conv2d-22 [-1, 128, 25, 25] 147,456
BatchNorm2d-23 [-1, 128, 25, 25] 256
Conv2d-24 [-1, 128, 25, 25] 8,192
BatchNorm2d-25 [-1, 128, 25, 25] 256
ReLU-26 [-1, 128, 25, 25] 0
BasicBlock-27 [-1, 128, 25, 25] 0
Conv2d-28 [-1, 128, 25, 25] 147,456
BatchNorm2d-29 [-1, 128, 25, 25] 256
ReLU-30 [-1, 128, 25, 25] 0
Conv2d-31 [-1, 128, 25, 25] 147,456
BatchNorm2d-32 [-1, 128, 25, 25] 256
ReLU-33 [-1, 128, 25, 25] 0
BasicBlock-34 [-1, 128, 25, 25] 0
Conv2d-35 [-1, 256, 13, 13] 294,912
BatchNorm2d-36 [-1, 256, 13, 13] 512
ReLU-37 [-1, 256, 13, 13] 0
Conv2d-38 [-1, 256, 13, 13] 589,824
BatchNorm2d-39 [-1, 256, 13, 13] 512
Conv2d-40 [-1, 256, 13, 13] 32,768
BatchNorm2d-41 [-1, 256, 13, 13] 512
ReLU-42 [-1, 256, 13, 13] 0
BasicBlock-43 [-1, 256, 13, 13] 0
Conv2d-44 [-1, 256, 13, 13] 589,824
BatchNorm2d-45 [-1, 256, 13, 13] 512
ReLU-46 [-1, 256, 13, 13] 0
Conv2d-47 [-1, 256, 13, 13] 589,824
BatchNorm2d-48 [-1, 256, 13, 13] 512
ReLU-49 [-1, 256, 13, 13] 0
BasicBlock-50 [-1, 256, 13, 13] 0
Conv2d-51 [-1, 512, 7, 7] 1,179,648
BatchNorm2d-52 [-1, 512, 7, 7] 1,024
ReLU-53 [-1, 512, 7, 7] 0
Conv2d-54 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-55 [-1, 512, 7, 7] 1,024
Conv2d-56 [-1, 512, 7, 7] 131,072
BatchNorm2d-57 [-1, 512, 7, 7] 1,024
ReLU-58 [-1, 512, 7, 7] 0
BasicBlock-59 [-1, 512, 7, 7] 0
Conv2d-60 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-61 [-1, 512, 7, 7] 1,024
ReLU-62 [-1, 512, 7, 7] 0
Conv2d-63 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-64 [-1, 512, 7, 7] 1,024
ReLU-65 [-1, 512, 7, 7] 0
BasicBlock-66 [-1, 512, 7, 7] 0
AdaptiveAvgPool2d-67 [-1, 512, 1, 1] 0
Linear-68 [-1, 1000] 513,000
================================================================
Total params: 11,689,512
Trainable params: 11,689,512
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.46
Forward/backward pass size (MB): 51.08
Params size (MB): 44.59
Estimated Total Size (MB): 96.13
----------------------------------------------------------------
Lightning
is a lightweight PyTorch wrapper.LightningModule
) is exactly the same as the PyTorch, except that the LightningModule
provides a structure for the research code.LightningModule
and LightningDataModule
.The basic structure of a LightningDataModule
is as follows:
import pytorch_lightning as pl
class MyDataModule(pl.LightningDataModule):
def __init__(self):
pass
def prepare_data(self):
# download, IO, etc. Useful with shared filesystems
pass
def setup(self, stage):
# make assignments here (val/train/test split)
dataset = RandomDataset(1, 100)
self.train, self.val, self.test = data.random_split(
dataset, [80, 10, 10], generator=torch.Generator().manual_seed(42)
)
def train_dataloader(self):
return data.DataLoader(self.train)
def val_dataloader(self):
return data.DataLoader(self.val)
def test_dataloader(self):
return data.DataLoader(self.test)
The basic structure of a LightningModule
is as follows:
class MyModel(pl.LightningModule):
def __init__(self):
super().__init__()
def forward(self, x):
return x
def loss(self, pred, label):
pass
def training_step(self, train_batch, batch_idx):
x, y = train_batch
pred = self.forward(x)
loss = self.loss(pred, y)
self.log('train_loss', loss)
return loss
def validation_step(self, val_batch, batch_idx):
x, y = val_batch
pred = self.forward(x)
loss = self.loss(pred, y)
self.log('val_loss', loss)
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
return optimizer
Training a model in Lightning
is as simple as follows:
Trainer
automates many things, such as:
optimizer.step()
, loss.backward()
, optimizer.zero_grad()
callsmodel.eval()
, enabling/disabling grads during evaluation