Training | Notion

Setup

After creating the class for our network, a model must be instantiated from that class.
A loss function, also referred to as criterion, needs to be defined, so that the weights gradients can be calculated. See available loss functions here: https://pytorch.org/docs/stable/nn.html#loss-functions
In order to effectively train the model, updating the weights, an optimizer algorithm must be chosen. See available optimizers here: https://pytorch.org/docs/stable/optim.html#algorithms

# Instantiating the model
model = Classifier()

# Defining the loss function to be used (in this case, negative log likelihood loss)
criterion = nn.NLLLoss()

# Setting the optimization algorithm (in this case, Adam)
optimizer = optim.Adam(model.parameters(), lr=0.003)

Training Loop

Generally, artificial neural networks are trained by optimizing the weights according to gradients of the loss, calculated from mini-batches of the training data. Every time that all the training data has been used, an epoch has completed. Usually, a model needs multiple epochs to get optimal performance.
If dropout is being used and the model is being trained, train mode should be activated (to actually use dropout), through model.train(). When evaluating the model's performance on test data, eval mode should be activated (to prevent the use of dropout), through model.eval().
In the beginning of each epoch, the gradients must be zeroed out, using optimizer.zero_grad(), to prevent accumulated gradients from previous iterations.
In order to get the output of the model from a given input, returning the logits or the class probabilities, the model's forward() method is called with the adequate input.
Having the output of the model, the loss function is used with the ground truth labels.
To get the gradients, the loss must be backpropagated through the network, using the loss functions backward() method.
Now the weights just need to be updated using the chosen optimization algorithm. To do so, just call the optimizer's step() method.
Ideally, the loss and performance of the model to the training data is saved and displayed at the end of each epoch, so that the developer can keep track of the model's progress.
It's also good practice to evaluate the model on validation data at the end of each epoch, as it represents the model's performance to unseen data. At the end, one should choose the model that got the lowest validation error, even if it was in a early epoch, as to prevent overfitting.

# Instantiating the model
model = Classifier()
# Defining the loss function to be used (in this case, negative log likelihood loss)
criterion = nn.NLLLoss()
# Setting the optimization algorithm (in this case, Adam)
optimizer = optim.Adam(model.parameters(), lr=0.003)
# number of epochs to train the model
n_epochs = 50
# initialize tracker for minimum validation loss
valid_loss_min = np.Inf # set initial "min" to infinity

for e in range(epochs):
    # monitor training loss
    train_loss = 0.0
    valid_loss = 0.0
		valid_accuracy = 0.0
    
    ###################
    # train the model #
    ###################
    model.train() # prep model for training
    for images, labels in trainloader:
        # clear the gradients of all optimized variables
        optimizer.zero_grad()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # calculate the loss
        loss = criterion(output, target)
        # backward pass: compute gradient of the loss with respect to model parameters
        loss.backward()
        # perform a single optimization step (parameter update)
        optimizer.step()
        # update running training loss
        train_loss += loss.item()
        
    ######################    
    # validate the model #
    ######################
    model.eval() # prep model for evaluation
    for data, target in valid_loader:
				# Turn off gradients for validation, saves memory and computations
        with torch.no_grad():
	        # forward pass: compute predicted outputs by passing inputs to the model
	        output = model(data)
	        # calculate the loss
	        loss = criterion(output, target)
	        # update running validation loss 
	        valid_loss += loss.item()
	        # find the top class (highest output probability)
          top_p, top_class = output.topk(1, dim=1)
					# see which images where correctly classified
          correct_class = top_class == labels.view(*top_class.shape)
					# add the accuracy in the current batch to the average accuracy value
          valid_accuracy += torch.mean(correct_class.type(torch.FloatTensor)
        
    # print training/validation statistics 
	  # calculate average loss over an epoch
	  train_loss = train_loss/len(train_loader.dataset)
	  valid_loss = valid_loss/len(valid_loader.dataset)
	
	  print("Epoch: {}/{}.. ".format(e+1, epochs),
	        "Training Loss: {:.3f}.. ".format(train_loss/len(trainloader)),
	        "Test Loss: {:.3f}.. ".format(valid_loss/len(testloader)),
	        "Test Accuracy: {:.3f}".format(valid_accuracy/len(testloader)))

		# save model if validation loss has decreased
    if valid_loss <= valid_loss_min:
        print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(
        valid_loss_min,
        valid_loss))
        torch.save(model.state_dict(), 'model.pt')
        valid_loss_min = valid_loss

Saving and Loading Models

In the simplest form, the model can be saved using the torch.save() method, saving just the state dictionary of the model from its state_dict() method.

torch.save(model.state_dict(), 'checkpoint.pth')

PyTorch models are saved in .pth format.