requires_grad attribute of each parameter should be turned to False. Then, either the final layer(s) keep the requires_grad as True or we change the architecture of the final layer, which will be activated in training automatically.# Freeze parameters so we don't backprop through them
for param in model.parameters():
param.requires_grad = False
from collections import OrderedDict
# Example of new classifier final layers, made of two fully connected layers that classify the data into two different classes
classifier = nn.Sequential(OrderedDict([
('fc1', nn.Linear(1024, 500)),
('relu', nn.ReLU()),
('fc2', nn.Linear(500, 2)),
('output', nn.LogSoftmax(dim=1))
]))
model.classifier = classifier
Furthermore, when defining the optimizer, it should only receive the parameters of the classifier (i.e. the layers that will be trained).
# Only train the classifier parameters, feature parameters are frozen
optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)
Several high-performance models, applicable to computer vision, are available in torchvision.models. Don't forget that the number present in the model's name usually corresponds to the number of layers. Usually, the more layers a network has, the better accuracy it has but also the more computationally heavy it is. To see all the available models, check the documentation: https://pytorch.org/docs/stable/torchvision/models.html?highlight=models