Using GPU and TPU

GPU's are much faster in deep learning applications than CPU's, usually in the scale of 100's. This is mainly due to the capacities of GPU's to handle large memories at once and to do the computations in parallel threads (https://www.quora.com/Why-are-GPUs-well-suited-to-deep-learning). Recently, a new kind of hardware named TPU (Tensor Processing Unit) has also been gaining popularity as an even better alternative than GPU, since it doesn't require accessing memories and saving results in intermediate calculations (https://cloud.google.com/blog/products/ai-machine-learning/what-makes-tpus-fine-tuned-for-deep-learning).
First off, to confirm that the current hardware has the option of using CUDA, call the method torch.cuda.is_available().
If CUDA is available, both the model and the batch of input data should be transferred from the local memory to the GPU by using the model.cuda() and data.cuda() methods.
To return the model and the data to the local memory and use the CPU, just run the model.cpu() and data.cpu() methods.
Instead of using the previously mentioned methods, there's also the option to do model.to('cuda') and data.to('cuda') or model.to('cpu') and data.to('cpu').
In order to make code agnostic to the use of either CPU or GPU, one can use torch.device() to write code like the following:

# at beginning of the script
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

...

# then whenever you get a new Tensor or Module
# this won't copy if they are already on the desired device
input = data.to(device)
model = MyModule(...).to(device)