TensorFlow (2015): Developed by Google Brain Team.PyTorch (2016): Developed by Facebook AI Research Lab.JAX (2018): Developed by Google Brain Team.MXNet (2015): Developed by Apache Software Foundation.Keras (2015): A high-level API that can run on top of JAX, TensorFlow, or PyTorch.torch.tensor() function.shape attribute is used to get the shape of a tensorrequired_grad attribute is used to track the gradient of the tensor.
required_grad=True, the gradient of the tensor will be computed during the backpropagation.torch.tensor.numpy()torch.tensor.from_numpy()requires_grad=True, it signals to autograd that every operation on them should be tracked..backward() on the final tensor to compute the gradient.x = torch.tensor([3.0], requires_grad=True)
y = x**2
y.backward()
print(x.grad.item()) # dy/dx = 2x and when x = 3, dy/dx = 66.0
x = torch.tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True)
y = torch.trace(torch.matmul(x.t(), x))
y.backward()
print(x.grad) tensor([[2., 4.],
[6., 8.]])
\[ y = \operatorname{tr}(X^TX) \Rightarrow \frac{\partial y}{\partial X} = 2X \]
attributes (name and age) and methods (get_name).class student(person):
def __init__(self, name, age, major):
super().__init__(name, age) # call the parent class constructor
self.major = major
def get_major(self):
return self.major
john = student("John", 20, "Stat")
print(f"{john.get_name()} is majoring in {john.get_major()}.")John is majoring in Stat.
In the data preparation stage, we need to do the following:
A dataset is stored in a Dataset class. Inside the dataset calss, we can
To feed the dataset to our model, we need the DataLoader class, which
You need to implement three methods in the Dataset class:
import numpy as np
rng = np.random.default_rng(20241029)
n = 30
p = 2
# Generate x_1, x_2, ..., x_30 from a normal distribution
x = rng.normal(loc=0.0, scale=1.0, size=(n, p))
# Generate y from the linear model y = 2 - x_1 + 3*x_2
beta = np.array([-1, 3])
y = 2 + x.dot(beta).reshape(-1,1) + rng.normal(loc=0.0, scale=0.1, size=(n, 1))
training_data = MyDataset(x, y)
print("The number fo samples is", training_data.__len__())
print("The first sample is (x_1, x_2, y) =", training_data.__getitem__(0))The number fo samples is 30
The first sample is (x_1, x_2, y) = (tensor([-0.7578, 1.2519]), tensor([6.6215]))
DataLoader class will:
shuffle=True)batch_size is specified)DataLoader is an iterable. After we iterate over all batches, the data will be shuffled again.torch.nn.Linear is a fully connected layer.state_dict() method or directly accessing the attributesOrderedDict([('weight', tensor([[-0.5396, 0.4724],
[ 0.4363, 0.3894],
[-0.6281, -0.3871]])), ('bias', tensor([0.2805, 0.6208, 0.5337]))])
See https://pytorch.org/docs/stable/nn.html
torch.nn.Module class is the base class for all neural network modules.torch.nn.Sequential class is a subclass of torch.nn.Module that is used to sequentially stack layers.torch.nn.Sequential, for example, ResNet, recurrent networks, etc.torch.nn.Module.A basic nn.Module subclass is as follows:
class model(nn.Module):
def __init__(self):
super().__init__()
# Define the layers
self.linear1 = nn.Linear(input_dim, hidden_dim)
self.relu = nn.ReLU()
self.linear2 = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
# Define the forward pass, i.e., how to compute the output from the input
x = self.linear1(x)
x = self.relu(x)
y = self.linear2(x)
return y__init__ method, we define the layers that will be used by the model.forward method, we define how the output \(y\) is obtained from the input \(x\).The following code defines a residual block with three FC layers and ReLU activation function.
import torch.nn as nn
class MyResBlock(nn.Module):
def __init__(self, input_dim):
super().__init__()
self.model = nn.Sequential(
nn.Linear(input_dim, 2*input_dim),
nn.ReLU(),
nn.Linear(2*input_dim, 2*input_dim),
nn.ReLU(),
nn.Linear(2*input_dim, input_dim)
)
def forward(self, x):
y = x + self.model(x)
return yWe can use MyResBlock to build a deep neural network.
torch.nn.MSELoss: Mean Squared Errortorch.nn.CrossEntropyLoss: Cross Entropytorch.nn.L1Loss: L1 Losstorch.nn.PoissonNLLLoss: Poisson Negative Log Likelihoodtorch.optim.SGD: Stochastic Gradient Descenttorch.optim.Adam: Adamzero_grad(): Clear the gradient stored in the optimizerstep(): Update the parametersHence a standard training loop looks like this:
def training_loop(dataloader, model, loss_fn, optimizer, n_epochs):
for epoch in range(n_epochs):
for x_batch, y_batch in dataloader:
# Compute prediction and loss
y_pred = model(x_batch)
loss = loss_fn(y_pred, y_batch)
# Backpropagation
loss.backward() # compute the gradient
optimizer.step() # update the parameters
optimizer.zero_grad() # clear the gradient stored in the optimizer
# print the training progress
if epoch % 10 == 0:
print(f"Epoch {epoch+1}, Loss = {loss.item():.3f}") We now have all the components (data, model, loss, and optimizer) to train the model.
loss_fn = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.05)
training_loop(train_loader, model, loss_fn, optimizer, 100)Epoch 1, Loss = 13.321
Epoch 11, Loss = 1.536
Epoch 21, Loss = 0.365
Epoch 31, Loss = 0.067
Epoch 41, Loss = 0.045
Epoch 51, Loss = 0.025
Epoch 61, Loss = 0.853
Epoch 71, Loss = 0.028
Epoch 81, Loss = 0.066
Epoch 91, Loss = 0.050
PyTorch, we train a model using the following steps:
torch.nn.Moduletorch.nn or define your own)torch.optim)TensorBoard is a visualization tool provided by TensorFlow, but it can also be used with PyTorch.
torch.utils.tensorboard.SummaryWriter class to log the training process.add_scalar method is used to log the scalar value.from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter('runs/example')
for epoch in range(100):
running_loss = 0.0
for x_batch, y_batch in train_loader:
# Compute prediction and loss
y_pred = model(x_batch)
loss = loss_fn(y_pred, y_batch)
# Backpropagation
loss.backward() # compute the gradient
optimizer.step() # update the parameters
optimizer.zero_grad() # clear the gradient stored in the optimizer
running_loss += loss.item()
avg_loss = running_loss / len(train_loader)
writer.add_scalar('Loss/Train', avg_loss, epoch + 1)
writer.flush()Use add_graph to visualize the network architecture.
state_dict attribute.OrderedDict([('0.model.0.weight',
tensor([[-0.4887, -0.8192],
[ 0.6946, 0.5337],
[ 0.1736, 0.0551],
[ 0.4247, -1.0334]])),
('0.model.0.bias', tensor([-0.3327, 0.5050, -0.6019, -0.0285])),
('0.model.2.weight',
tensor([[ 0.6055, -0.6635, 0.4054, 0.5843],
[ 0.2425, -0.1477, -0.2659, 0.7719],
[ 0.1666, -0.4641, -0.0790, -0.1267],
[-0.4862, -0.1035, -0.2386, -0.1672]])),
('0.model.2.bias', tensor([ 0.0108, -0.1911, -0.0543, -0.2758])),
('0.model.4.weight',
tensor([[ 0.9599, 0.8796, -0.0876, 0.0927],
[ 0.2789, 0.1629, 0.4322, 0.0858]])),
('0.model.4.bias', tensor([0.7170, 1.1826])),
('2.model.0.weight',
tensor([[ 0.5355, -0.1691],
[-0.0610, 0.5382],
[-0.7051, -0.3227],
[ 0.4620, -0.4787]])),
('2.model.0.bias', tensor([ 0.0993, 0.0737, -0.2287, -0.5889])),
('2.model.2.weight',
tensor([[-0.5853, 0.3475, 0.2194, -0.2362],
[ 0.0463, 0.2101, -0.2726, 0.0574],
[-0.4221, -0.3367, -0.4742, 0.2873],
[-0.1169, -0.0678, -0.4661, -0.0395]])),
('2.model.2.bias', tensor([-0.0908, -0.4833, 0.5289, 0.1900])),
('2.model.4.weight',
tensor([[-0.0702, -0.2431, -0.2679, -0.3471],
[ 0.2825, 0.1204, -0.2197, -0.1854]])),
('2.model.4.bias', tensor([ 0.0370, -0.1809])),
('4.weight', tensor([[-1.0629, 2.5723]])),
('4.bias', tensor([0.0272]))])
import torchvision.models as models
from torchsummary import summary
resnet = models.resnet18(weights=None) # No weights - random initialization
summary(resnet, (3, 200, 200))----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 100, 100] 9,408
BatchNorm2d-2 [-1, 64, 100, 100] 128
ReLU-3 [-1, 64, 100, 100] 0
MaxPool2d-4 [-1, 64, 50, 50] 0
Conv2d-5 [-1, 64, 50, 50] 36,864
BatchNorm2d-6 [-1, 64, 50, 50] 128
ReLU-7 [-1, 64, 50, 50] 0
Conv2d-8 [-1, 64, 50, 50] 36,864
BatchNorm2d-9 [-1, 64, 50, 50] 128
ReLU-10 [-1, 64, 50, 50] 0
BasicBlock-11 [-1, 64, 50, 50] 0
Conv2d-12 [-1, 64, 50, 50] 36,864
BatchNorm2d-13 [-1, 64, 50, 50] 128
ReLU-14 [-1, 64, 50, 50] 0
Conv2d-15 [-1, 64, 50, 50] 36,864
BatchNorm2d-16 [-1, 64, 50, 50] 128
ReLU-17 [-1, 64, 50, 50] 0
BasicBlock-18 [-1, 64, 50, 50] 0
Conv2d-19 [-1, 128, 25, 25] 73,728
BatchNorm2d-20 [-1, 128, 25, 25] 256
ReLU-21 [-1, 128, 25, 25] 0
Conv2d-22 [-1, 128, 25, 25] 147,456
BatchNorm2d-23 [-1, 128, 25, 25] 256
Conv2d-24 [-1, 128, 25, 25] 8,192
BatchNorm2d-25 [-1, 128, 25, 25] 256
ReLU-26 [-1, 128, 25, 25] 0
BasicBlock-27 [-1, 128, 25, 25] 0
Conv2d-28 [-1, 128, 25, 25] 147,456
BatchNorm2d-29 [-1, 128, 25, 25] 256
ReLU-30 [-1, 128, 25, 25] 0
Conv2d-31 [-1, 128, 25, 25] 147,456
BatchNorm2d-32 [-1, 128, 25, 25] 256
ReLU-33 [-1, 128, 25, 25] 0
BasicBlock-34 [-1, 128, 25, 25] 0
Conv2d-35 [-1, 256, 13, 13] 294,912
BatchNorm2d-36 [-1, 256, 13, 13] 512
ReLU-37 [-1, 256, 13, 13] 0
Conv2d-38 [-1, 256, 13, 13] 589,824
BatchNorm2d-39 [-1, 256, 13, 13] 512
Conv2d-40 [-1, 256, 13, 13] 32,768
BatchNorm2d-41 [-1, 256, 13, 13] 512
ReLU-42 [-1, 256, 13, 13] 0
BasicBlock-43 [-1, 256, 13, 13] 0
Conv2d-44 [-1, 256, 13, 13] 589,824
BatchNorm2d-45 [-1, 256, 13, 13] 512
ReLU-46 [-1, 256, 13, 13] 0
Conv2d-47 [-1, 256, 13, 13] 589,824
BatchNorm2d-48 [-1, 256, 13, 13] 512
ReLU-49 [-1, 256, 13, 13] 0
BasicBlock-50 [-1, 256, 13, 13] 0
Conv2d-51 [-1, 512, 7, 7] 1,179,648
BatchNorm2d-52 [-1, 512, 7, 7] 1,024
ReLU-53 [-1, 512, 7, 7] 0
Conv2d-54 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-55 [-1, 512, 7, 7] 1,024
Conv2d-56 [-1, 512, 7, 7] 131,072
BatchNorm2d-57 [-1, 512, 7, 7] 1,024
ReLU-58 [-1, 512, 7, 7] 0
BasicBlock-59 [-1, 512, 7, 7] 0
Conv2d-60 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-61 [-1, 512, 7, 7] 1,024
ReLU-62 [-1, 512, 7, 7] 0
Conv2d-63 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-64 [-1, 512, 7, 7] 1,024
ReLU-65 [-1, 512, 7, 7] 0
BasicBlock-66 [-1, 512, 7, 7] 0
AdaptiveAvgPool2d-67 [-1, 512, 1, 1] 0
Linear-68 [-1, 1000] 513,000
================================================================
Total params: 11,689,512
Trainable params: 11,689,512
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.46
Forward/backward pass size (MB): 51.08
Params size (MB): 44.59
Estimated Total Size (MB): 96.13
----------------------------------------------------------------
Lightning is a lightweight PyTorch wrapper.LightningModule) is exactly the same as the PyTorch, except that the LightningModule provides a structure for the research code.LightningModule and LightningDataModule.The basic structure of a LightningDataModule is as follows:
import pytorch_lightning as pl
class MyDataModule(pl.LightningDataModule):
def __init__(self):
pass
def prepare_data(self):
# download, IO, etc. Useful with shared filesystems
pass
def setup(self, stage):
# make assignments here (val/train/test split)
dataset = RandomDataset(1, 100)
self.train, self.val, self.test = data.random_split(
dataset, [80, 10, 10], generator=torch.Generator().manual_seed(42)
)
def train_dataloader(self):
return data.DataLoader(self.train)
def val_dataloader(self):
return data.DataLoader(self.val)
def test_dataloader(self):
return data.DataLoader(self.test)The basic structure of a LightningModule is as follows:
class MyModel(pl.LightningModule):
def __init__(self):
super().__init__()
def forward(self, x):
return x
def loss(self, pred, label):
pass
def training_step(self, train_batch, batch_idx):
x, y = train_batch
pred = self.forward(x)
loss = self.loss(pred, y)
self.log('train_loss', loss)
return loss
def validation_step(self, val_batch, batch_idx):
x, y = val_batch
pred = self.forward(x)
loss = self.loss(pred, y)
self.log('val_loss', loss)
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
return optimizerTraining a model in Lightning is as simple as follows:
Trainer automates many things, such as:
optimizer.step(), loss.backward(), optimizer.zero_grad() callsmodel.eval(), enabling/disabling grads during evaluation