Tips & Tricks | Notion

The view() method can change the shape of tensors, according to a specified set or to match the shape of another tensor. Always returns a new tensor, it isn't performed in-place. Similar methods include reshape() and resize_(). See more here: https://pytorch.org/docs/stable/tensors.html?highlight=view#torch.Tensor.view

Change shape to hardcoded values:

tensor_one.view(5, 1)

Only set the length in a subset of the dimensions, making the rest fit in:

tensor_one.view(5, -1)

Match the shape of another tensor:

tensor_one.view(*tensor_two.shape)

# Alternative
tensor_one.view_as(tensor_two.shape)

The type() method allows to change the tensor's type. For instance, to change it to a float tensor:

tensor.type(torch.FloatTensor)

topk() method gives the 𝑘 highest values. This returns a tuple of the top-𝑘 values and the top-𝑘 indices. If the highest value is the fifth element, we'll get back 4 as the index. When we want to know the most likely class in a prediction, we can use ps.topk(1), being that ps is a tensor containing output probabilities. Learn more about this method here: https://pytorch.org/docs/stable/torch.html#torch.topk

top_p, top_class = ps.topk(1, dim=1)

# Look at the most likely classes for the first 10 examples
print(top_class[:10,:])

When debugging, make sure that tensors being used together are exactly the right shape, using the .shape attribute. For instance, if two tensors are being equaled but one tensor has shape [N] and the other has shape [N, 1], it will result in a [N, N] tensor, as it will compare each element of the two dimensional tensor with each element of the one dimensional tensor.
torch.nn.Sequential() serves as a container that can combine multiple layers in a specific order. Afterwards, one can just pass tensors through the sequential object. An interesting option is to use an OrderedDict to name each layer, which helps by adding more clarity to the model's structure. As the example on the right shows, activation functions can be specified in between layers, without requiring changes in the forward method.

# Example of using Sequential
model = nn.Sequential(
          nn.Conv2d(1,20,5),
          nn.ReLU(),
          nn.Conv2d(20,64,5),
          nn.ReLU()
        )

# Example of using Sequential with OrderedDict
model = nn.Sequential(OrderedDict([
          ('conv1', nn.Conv2d(1,20,5)),
          ('relu1', nn.ReLU()),
          ('conv2', nn.Conv2d(20,64,5)),
          ('relu2', nn.ReLU())
        ]))

If one wants to programmatically create layers/modules without a specific order, the nn.ModuleList is probably the best option to do so. Essentially, it works just like a Python list, where we can append PyTorch modules, except that, in this kind of list, PyTorch is aware of what's being added and includes each module's parameters in the optimization processes. Alternatively, a dictionary version is possible with nn.ModuleDict, which can be useful by naming each module or setting it to a certain index.

# List version
class MLP(nn.Module):
    def __init__(self, h_sizes, out_size):
        ...
        self.hidden = nn.ModuleList()
        for k in range(len(h_sizes)-1):
            self.hidden.append(nn.Linear(h_sizes[k], h_sizes[k+1]))
        ...

# Dictionary version
class MLP(nn.Module):
    def __init__(self, h_sizes, out_size):
        ...
        self.hidden = nn.ModuleDict()
        for k in range(len(h_sizes)-1):
            self.hidden[f'linear_{k}'] = nn.Linear(h_sizes[k], h_sizes[k+1])
        ...

PyTorch can only perform operations on tensors that are on the same device, so either both CPU or both GPU.
Applying the nn.CrossEntropyLoss() loss function or doing the combination of F.log_softmax() in the final layer while using the nn.NLLLoss() is the same. It can however be more convenient to go for the second option, as that way we can always get the output of the model as class probabilities.
When using functions like narrow(), view(), expand() and transpose(), PyTorch isn't creating a new tensor, but rather attributes to the new variable the same memory as the original tensor with modified meta information. This means that the new variable is not contiguous, which can be a problem, specially in multilayer LSTM's. To prevent these problems, the new variable must be created as a new copy, within a different block of memory, by using the method tensor.contiguous(). Further explanation: https://stackoverflow.com/questions/48915810/pytorch-contiguous

# stack up LSTM outputs
out = out.contiguous().view(-1, self.n_hidden)

clip_grad_norm helps prevent the exploding gradient problem in RNNs / LSTMs.

# use clip_grad_norm_ to help prevent exploding gradients
nn.utils.clip_grad_norm_(net.parameters(), clip)