- GPU's are much faster in deep learning applications than CPU's, usually in the scale of 100's. This is mainly due to the capacities of GPU's to handle large memories at once and to do the computations in parallel threads (https://www.quora.com/Why-are-GPUs-well-suited-to-deep-learning). Recently, a new kind of hardware named TPU (Tensor Processing Unit) has also been gaining popularity as an even better alternative than GPU, since it doesn't require accessing memories and saving results in intermediate calculations (https://cloud.google.com/blog/products/ai-machine-learning/what-makes-tpus-fine-tuned-for-deep-learning).
- First off, to confirm that the current hardware has the option of using CUDA, call the method
torch.cuda.is_available()
.
- If CUDA is available, both the model and the batch of input data should be transferred from the local memory to the GPU by using the
model.cuda()
and data.cuda()
methods.
- To return the model and the data to the local memory and use the CPU, just run the
model.cpu()
and data.cpu()
methods.
- Instead of using the previously mentioned methods, there's also the option to do
model.to('cuda')
and data.to('cuda')
or model.to('cpu')
and data.to('cpu')
.
- In order to make code agnostic to the use of either CPU or GPU, one can use
torch.device()
to write code like the following:
# at beginning of the script
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
...
# then whenever you get a new Tensor or Module
# this won't copy if they are already on the desired device
input = data.to(device)
model = MyModule(...).to(device)