ray submit
call or an ssh tunnel using the cloud provider's API. In this example, I'm assuming that:
ray-default-head-071df709
perseids-scholarship
europe-west2-a
8265
(which is usually where it's at)http://localhost:8000/
Directly in Ray:
ray submit config.yaml script.py --start --stop --port-forward 8265
In ssh in general:
ssh -L 8000:localhost:8265 gw.example.com
In GCP:
gcloud compute ssh ray-default-head-071df709 \\
--project perseids-scholarship \\
--zone europe-west2-a \\
-- -L 8000:localhost:8265
In Azure:
ssh -L 8000:localhost.239:8265 ubuntu@vm-public-ip
DistributedDataParallel
, we lose direct access to the model's custom attributes. In order to get those attributes' values again, we must use model.module
, assuming that model is the original model wrapped in DDP
.# Before Ray
model.custom_att # Works
# After Ray
model.custom_att # Fails
model.module.custom_att # Works
RuntimeError: CUDA out of memory. Tried to allocate 5.54 GiB (GPU 0; 15.90 GiB total capacity; 11.08 GiB already allocated; 4.10 GiB free; 11.09 GiB reserved in total by PyTorch)