requires_grad
attribute of each parameter should be turned to False
. Then, either the final layer(s) keep the requires_grad
as True
or we change the architecture of the final layer, which will be activated in training automatically.# Freeze parameters so we don't backprop through them
for param in model.parameters():
param.requires_grad = False
from collections import OrderedDict
# Example of new classifier final layers, made of two fully connected layers that classify the data into two different classes
classifier = nn.Sequential(OrderedDict([
('fc1', nn.Linear(1024, 500)),
('relu', nn.ReLU()),
('fc2', nn.Linear(500, 2)),
('output', nn.LogSoftmax(dim=1))
]))
model.classifier = classifier
Furthermore, when defining the optimizer, it should only receive the parameters of the classifier (i.e. the layers that will be trained).
# Only train the classifier parameters, feature parameters are frozen
optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)
Several high-performance models, applicable to computer vision, are available in torchvision.models
. Don't forget that the number present in the model's name usually corresponds to the number of layers. Usually, the more layers a network has, the better accuracy it has but also the more computationally heavy it is. To see all the available models, check the documentation: https://pytorch.org/docs/stable/torchvision/models.html?highlight=models