conda-pytorch
This examples will show you how to setup and prepare an environment for PyTorch jobs using conda on Trixie:
1. Create a pytorch miniconda environment¶
Either run from the command line or create pytorchconda-environment.sh and run it:
#!/bin/bash
# load the miniconda module
module load conda/3-24.9.0
# create a conda environment with python 3.7 named pytorch
conda create --name pytorch python=3.7
source activate pytorch
# install pytorch dependencies via conda
conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch
2. Create a test pytorch python script: testtorch.py¶
import torch
print('GPU available:', torch.cuda.is_available())
3. Create a job submission script: testpytorch.sh¶
#!/bin/bash
# Specify the partition of the cluster to run on (Typically TrixieMain)
#SBATCH --partition=TrixieMain
# Add your project account code using -A or --account
#SBATCH --account ai4d
# Specify the time allocated to the job. Max 12 hours on TrixieMain queue.
#SBATCH --time=12:00:00
# Request GPUs for the job. In this case 4 GPUs
#SBATCH --gres=gpu:4
# Print out the hostname that the jobs is running on
hostname
# Run nvidia-smi to ensure that the job sees the GPUs
/usr/bin/nvidia-smi
# Load the miniconda module on the compute node
module load conda/3-24.9.0
# Activate the conda pytorch environment created in step 1
source activate pytorch
# Launch our test pytorch python file
python testtorch.py
4. Submit job for execution¶
sbatch testpytorch.sh
Output will be 'Submitted batch job XXXXX'
5. Confirm execution results¶
Local directory will contain a file 'slurm-XXXXX.out' which is the output of the job (stdout).
Output should be:
cnXXX - <nodename>
<Date>
+--------
| NVIDIA-SMI XXXX...
....
(4 listed V100 GPUs number 0 to 3)
GPU available: True