Trixie

Be a Good Cluster-Citizen

Samuel Larkin

What do we want to accomplish?

We have a set of commands that we would like to run on a SLURM cluster.

              
                conda activate my_env
                python train_nn.py --language="fr" --file=file.fr.txt

We need a wrapper script to encapsulate the command we want to run and tell SLURM what resources our job will require.

SLURM commands

sbatch - Submit a batch script to Slurm
squeue - view information about jobs located in the Slurm scheduling queue
scancel - Used to signal jobs or job steps that are under the control of Slurm
sinfo - View information about Slurm nodes and partitions

Trixie

trixie.res.nrc.gc.ca from the black & orange networks
/gpfs/work/onboarding.sh
/home/${USER} has a quota of 50GB and has snapshots disabled.
/gpfs/work/${USER} is your primary work-space and has a quota of 500GB and 1M inodes with snapshots enabled
/gpfs/projects a common work-space
We request that users not create conda, mamba, venvs, etc within the /gpfs/work space, and instead use their $HOME and create symlinks to their projects. The snapshot feature of GPFS does not perform optimally when many small files are created and destroyed as with such utilities. We understand that some workflows on Trixie have this behaviour inherently, but an effort to reduce this behaviour is appreciated

Running RHEL 9 sbatch --version slurm 22.05.9 https://ai4d-iasc.github.io/trixie/Hardware/

************************************************************************
** On first log-in to the RHEL9 setup, please run onboarding.sh in    **
** your /home/${USER} directory. If you do not have onboarding.sh in  **
** your home directory, copy it from /gpfs/work/                      **
** Note the new quotas:                                               **
** /home/${USER} : 50GiB/1M inodes                                    **
** /gpfs/work/${USER} : 500GiB/1M inodes                              **
************************************************************************

Getting some Software

module avail

/usr/share/Modules/modulefiles

dot
module-git
module-info
modules
null
use.own

/usr/share/modulefiles

mpi/openmpi-x86_64

/gpfs/share/Modules/modulefiles

mathematica/14.0

/gpfs/share/rhel9/opt/spack/share/spack/modules/linux-rhel9-skylake_avx512

anaconda3/2023.09-0-gcc-11.3.1-4njg4u3
autoconf-archive/2023.02.20-gcc-11.3.1-x3v3e3k
autoconf/2.72-gcc-11.3.1-zigeprh
automake/1.16.5-gcc-11.3.1-fiei3qm
bdftopcf/1.1-gcc-11.3.1-pjei4lg
berkeley-db/18.1.40-gcc-11.3.1-imacxi5
binutils/2.42-gcc-11.3.1-k2azs5r
bison/3.8.2-gcc-11.3.1-5ymaspt
bzip2/1.0.8-gcc-11.3.1-jftbk52
ca-certificates-mozilla/2023-05-30-gcc-11.3.1-3q7ngqu
cmake/3.27.9-gcc-11.3.1-ddiy5al
cpio/2.15-gcc-11.3.1-gysadjq
curl/8.7.1-gcc-11.3.1-ct7pjm2
diffutils/3.10-gcc-11.3.1-i2xxusk
expat/2.6.2-gcc-11.3.1-i6fawb2
findutils/4.9.0-gcc-11.3.1-vrwwaoy
fixesproto/5.0-gcc-11.3.1-rzgygqz
flex/2.6.3-gcc-11.3.1-lvm5eg4
font-util/1.4.0-gcc-11.3.1-kdzp4kf
fontconfig/2.15.0-gcc-11.3.1-oknygy4
fontsproto/2.1.3-gcc-11.3.1-hnnw5wr
freeglut/3.2.2-gcc-11.3.1-bj2wnkh
freetype/2.13.2-gcc-11.3.1-mukuuf4
gawk/5.3.0-gcc-11.3.1-uuiqxd2
gcc-runtime/11.3.1-gcc-11.3.1-ts54e2r
gcc/13.2.0-gcc-11.3.1-5cmgvey
gdbm/1.23-gcc-11.3.1-hf42icl
gettext/0.22.5-gcc-11.3.1-acdclxk
glibc/2.34-gcc-11.3.1-rv4ofgg
glproto/1.4.17-gcc-11.3.1-xwh2ngh
glx/1.4-gcc-11.3.1-exlgy5j
gmake/4.4.1-gcc-11.3.1-jpbz7dw
gmp/6.2.1-gcc-11.3.1-j2qp7w7
gperf/3.1-gcc-11.3.1-umkvi7d
htop/3.2.2-gcc-11.3.1-oa7ttvf
hwloc/2.9.1-gcc-11.3.1-weybm2e
inputproto/2.3.2-gcc-11.3.1-bo7qinx
intel-mpi/2019.7.217-gcc-11.3.1-o2z3fjd
intel-oneapi-compilers-classic/2021.10.0-gcc-11.3.1-nocpr5m
intel-oneapi-compilers/2023.2.4-gcc-11.3.1-ddcm4wi
intel-oneapi-mpi/2021.12.1-gcc-11.3.1-4kgjxio
kbproto/1.0.7-gcc-11.3.1-6uzvk2c
libbsd/0.12.1-gcc-11.3.1-tca36ex
libedit/3.1-20230828-gcc-11.3.1-ysouu3l
libffi/3.4.6-gcc-11.3.1-clmfd4j
libfontenc/1.1.8-gcc-11.3.1-w44jsoi
libice/1.1.1-gcc-11.3.1-76rcd55
libiconv/1.17-gcc-11.3.1-sqkuf4t
libmd/1.0.4-gcc-11.3.1-2yooftr
libpciaccess/0.17-gcc-11.3.1-niczctf
libpng/1.2.57-gcc-11.3.1-oqex7om
libpthread-stubs/0.5-gcc-11.3.1-xfbnr2b
libsigsegv/2.14-gcc-11.3.1-5kireuc
libsm/1.2.4-gcc-11.3.1-sgdfk4e
libtool/2.4.7-gcc-11.3.1-gtphuga
libunwind/1.6.2-gcc-11.3.1-6mtzyno
libx11/1.8.7-gcc-11.3.1-d5xeazl
libxau/1.0.11-gcc-11.3.1-dkkx74b
libxcb/1.16-gcc-11.3.1-fulboi2
libxcrypt/4.4.35-gcc-11.3.1-7kd52bn
libxdmcp/1.1.4-gcc-11.3.1-7fe4723
libxext/1.3.5-gcc-11.3.1-7d2ci6d
libxfixes/5.0.3-gcc-11.3.1-k53yjzd
libxfont/1.5.4-gcc-11.3.1-etmwjjy
libxft/2.3.8-gcc-11.3.1-vwyuhmz
libxi/1.7.10-gcc-11.3.1-rgytoua
libxml2/2.10.3-gcc-11.3.1-e5zt4m2
libxrandr/1.5.4-gcc-11.3.1-qxmrns4
libxrender/0.9.11-gcc-11.3.1-ao2tic2
libxscrnsaver/1.2.4-gcc-11.3.1-zs6aqfi
libxt/1.3.0-gcc-11.3.1-lwme3qs
libxxf86vm/1.1.5-gcc-11.3.1-6pze4ei
llvm/17.0.6-gcc-11.3.1-xlwz53w
lua/5.3.6-gcc-11.3.1-hn2ac7j
lumerical/2019b-r2-gcc-11.3.1-ejm6mo6
lumerical/2021-R1.1-2599-gcc-11.3.1-vvyw6z6
m4/1.4.19-gcc-11.3.1-tncckyq
mesa-glu/9.0.1-gcc-11.3.1-glwxfkj
mesa-glu/9.0.2-gcc-11.3.1-bd2lm7d
mesa/23.3.6-gcc-11.3.1-mx6glm4
meson/1.3.2-gcc-11.3.1-w7cv5bi
mkfontdir/1.0.7-gcc-11.3.1-3xvst7e
mkfontscale/1.2.3-gcc-11.3.1-4vmnuo2
mpc/1.3.1-gcc-11.3.1-ifuk5gu
mpfr/4.2.1-gcc-11.3.1-i6gtxh6
ncurses/6.5-gcc-11.3.1-z54b6d4
nghttp2/1.57.0-gcc-11.3.1-7c6bk73
ninja/1.11.1-gcc-11.3.1-wcew5xr
openssl/3.3.0-gcc-11.3.1-y2icle5
parallel/20220522-gcc-11.3.1-juzq7oy
patchelf/0.17.2-gcc-11.3.1-t6bdsvg
pcre2/10.43-gcc-11.3.1-3y4lcyt
perl-data-dumper/2.173-gcc-11.3.1-xwk47wz
perl/5.38.0-gcc-11.3.1-4wtgagw
pigz/2.8-gcc-11.3.1-bswu4yx
pkgconf/2.2.0-gcc-11.3.1-reshpid
py-mako/1.2.4-gcc-11.3.1-qzty4ic
py-markupsafe/2.1.3-gcc-11.3.1-xwc652n
py-pip/23.1.2-gcc-11.3.1-bmelf66
py-setuptools/69.2.0-gcc-11.3.1-vjtezkl
py-wheel/0.41.2-gcc-11.3.1-k5uxoxo
python-venv/1.0-gcc-11.3.1-jtnjye4
python/3.10.13-gcc-11.3.1-73qnjae
python/3.11.7-gcc-11.3.1-hsfwjwr
python/3.12.1-gcc-11.3.1-7tuhjhr
python/3.9.18-gcc-11.3.1-tc7dyca
randrproto/1.5.0-gcc-11.3.1-b6ssnsk
re2c/2.2-gcc-11.3.1-vxkdcjq
readline/8.2-gcc-11.3.1-qu4mv62
renderproto/0.11.1-gcc-11.3.1-uhoybj4
scrnsaverproto/1.2.2-gcc-11.3.1-cw3khfj
sqlite/3.43.2-gcc-11.3.1-bodsqyx
swig/4.1.1-gcc-11.3.1-zaf4wit
tar/1.34-gcc-11.3.1-wcuczap
tcl/8.6.12-gcc-11.3.1-o3atwx3
texinfo/7.0.3-gcc-11.3.1-bhmkpv4
tk/8.6.11-gcc-11.3.1-wz3kgh7
unzip/6.0-gcc-11.3.1-kptkztv
util-linux-uuid/2.38.1-gcc-11.3.1-ekl7sit
util-macros/1.19.3-gcc-11.3.1-3ocj656
xcb-proto/1.16.0-gcc-11.3.1-scf7ljj
xextproto/7.3.0-gcc-11.3.1-6mytvtp
xf86vidmodeproto/2.3.1-gcc-11.3.1-gc624cq
xproto/7.0.31-gcc-11.3.1-hhddgfu
xrandr/1.5.2-gcc-11.3.1-nbhkevq
xtrans/1.5.0-gcc-11.3.1-67qr5rx
xz/5.4.6-gcc-11.3.1-pswkf4y
zlib-ng/2.1.6-gcc-11.3.1-snhalql
zlib/1.3.1-gcc-11.3.1-fk7flnb
zstd/1.5.6-gcc-11.3.1-mldt4gi

Key:

loaded auto-loaded modulepath

module load MODULE

Partitions, JobTesting

sinfo

PARTITION	AVAIL	TIMELIMIT	NODES	STATE	NODELIST
TrixieMain*	up	12:00:00	4	drain	cn[131-134]
TrixieMain*	up	12:00:00	2	mix	cn[108-109]
TrixieMain*	up	12:00:00	22	idle	cn[107,110-130]
TrixieLong	up	2-00:00:00	1	drain	cn131
TrixieLong	up	2-00:00:00	2	mix	cn[108-109]
TrixieLong	up	2-00:00:00	22	idle	cn[107,110-130]
JobTesting	up	6:00:00	2	idle	cn[135-136]

sbatch --partition=JobTesting ...

Account Name

See 📑 Account-Codes for a list of codes

DT Digital Technologies / Technologies Numériques

dt-dac
dt-dscs
dt-mtp
dt-ta

sbatch --account=account_code ...

Node's Resources

sinfo --Node --responding --long

NODELIST	NODES	PARTITION	STATE	CPUS	S:C:T	MEMORY	TMP_DISK	WEIGHT	AVAIL_FE	REASON
cn106	1	DevTest	idle	64	2:16:2	192777	0	1	(null)	none
cn107	1	TrixieLong	idle	64	2:16:2	192777	0	1	(null)	none
cn107	1	TrixieMain*	idle	64	2:16:2	192777	0	1	(null)	none
cn108	1	TrixieLong	idle	64	2:16:2	192777	0	1	(null)	none
...

What Do Nodes Have to Offer?

scontrol show nodes

            
NodeName=cn136 Arch=x86_64 CoresPerSocket=16
  CPUAlloc=0 CPUTot=64 CPULoad=0.01
  AvailableFeatures=(null)
  ActiveFeatures=(null)
  Gres=gpu:4
  NodeAddr=cn136 NodeHostName=cn136
  OS=Linux 3.10.0-1160.62.1.el7.x86_64 #1 SMP Tue Apr 5 16:57:59 UTC 2022
  RealMemory=192777 AllocMem=0 FreeMem=183181 Sockets=2 Boards=1
  State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
  Partitions=JobTesting
  BootTime=2024-05-29T14:23:15 SlurmdStartTime=2024-05-29T14:23:36
  CfgTRES=cpu=64,mem=192777M,billing=64,gres/gpu=4
  AllocTRES=
  CapWatts=n/a
  CurrentWatts=0 AveWatts=0
  ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

The GPUs

Tesla V100-SXM2-32GB
May 10, 2017
NVIDIA-SMI 565.57.01
Driver Version: 565.57.01
CUDA Version: 12.7
CUDA Compute Capabilities 7.0
Single Precision: 15 TFLOPS
Tensor Performance (Deep Learning): 120 TFLOPS

CPUs

processor_type = Intel Xeon Gold 6130 CPU clocked at 2.1GHZ 16 cores / CPU
processors_per_node = 2
cores_per_socket = 16
threads_per_core = 2 (hyper-threading on)
RAM = 192 GB memory

Slurm Header Example

Resources request


#!/bin/bash
# vim:nowrap:

#SBATCH --job-name=My_Wonderful
#SBATCH --comment="My Wonderful Script"

# On Trixie
#SBATCH --partition=TrixieMain
#SBATCH --account=dt-mtp

#SBATCH --gres=gpu:4
#SBATCH --time=12:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=6
#SBATCH --mem=96G

#SBATCH --open-mode=append
#SBATCH --output=%x-%j.out
#SBATCH --requeue
#SBATCH --signal=B:USR1@30

Automatic requeueing


# Requeueing on Trixie
function _requeue {
  echo "~requeueing $SLURM_JOBID"
  date
  scontrol requeue $SLURM_JOBID
}

if [[ -n "$SLURM_JOBID" ]]; then
  trap _requeue USR1
fi

sbatch my_wonderful.sh

Do I Really Need a Script?

Explicitly specifying all options at each invocation

            
sbatch \
    --job-name=My_Wonderful \
    --comment="My Wonderful Script" \
    --partition=TrixieMain \
    --account=dt-mtp \
    --gres=gpu:4 \
    --time=12:00:00 \
    --nodes=1 \
    --ntasks-per-node=4 \
    --cpus-per-task=6 \
    --mem=96G \
    --open-mode=append \
    --requeue \
    --signal=B:USR1@30 \
    --output=%x-%j.out \
    my_wonderful.sh args ...

😨

Overriding what is different


                sbatch --job-name=OtherName my_wonderful.sh

😏

Set the Expected RAM Limit

#SBATCH --mem=96G

Otherwise the scheduler assumes that you want all of the memory which implies exclusive access to that node, preventing other jobs to use the remainder of the resources of that node.

Make your Job Resumable

              
#SBATCH --requeue
#SBATCH --signal=B:USR1@30

# Requeueing on Trixie
# [source](https://www.sherlock.stanford.edu/docs/user-guide/running-jobs/)
# [source](https://hpc-uit.readthedocs.io/en/latest/jobs/examples.html)
function _requeue {
  echo "BASH - trapping signal 10 - requeueing $SLURM_JOBID"
  date
  scontrol requeue $SLURM_JOBID
}

if [[ -n "$SLURM_JOBID" ]]; then
  # Only if the job was submitted to SLURM.
  trap _requeue USR1
fi

Your code needs to be able to handle resuming from a previous checkpoint.
--signal=B:USR1@30 Ask the scheduler to send a USR1 signal 30 seconds before the time limit
trap _requeue USR1 Act on USR1 signal by calling _requeue()
📑 Automatically Requeueing to Resume

Submit my Job

queue your job

sbatch my_wonderful.sh

check that your job is running

squeue

JOBID	NAME	USER	ST	TIME	NODES	NODELIST(REASON)	SUBMIT_TIME	COMMENT
733	My_Wonderful	larkins	R	7:43:44	1	trixie-cn101	2024-07-17T02:26:0	My Wonderful Script

Check `nvidia-smi -l` for Good GPU Usage

ssh -t cn101 nvidia-smi -l

            
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla V100-SXM2-32GB           On  |   00000000:89:00.0 Off |                    0 |
| N/A   54C    P0            139W /  300W |   28888MiB /  32768MiB |    100%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla V100-SXM2-32GB           On  |   00000000:8A:00.0 Off |                    0 |
| N/A   68C    P0            282W /  300W |   28846MiB /  32768MiB |     99%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  Tesla V100-SXM2-32GB           On  |   00000000:B2:00.0 Off |                    0 |
| N/A   58C    P0            289W /  300W |   28918MiB /  32768MiB |     99%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  Tesla V100-SXM2-32GB           On  |   00000000:B3:00.0 Off |                    0 |
| N/A   68C    P0            284W /  300W |   28918MiB /  32768MiB |     98%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A   3077932      C   ...C-Senate/nmt/tools/venv/bin/python3      28884MiB |
|    1   N/A  N/A   3077933      C   ...C-Senate/nmt/tools/venv/bin/python3      28842MiB |
|    2   N/A  N/A   3077934      C   ...C-Senate/nmt/tools/venv/bin/python3      28914MiB |
|    3   N/A  N/A   3077935      C   ...C-Senate/nmt/tools/venv/bin/python3      28914MiB |
+-----------------------------------------------------------------------------------------+

Asking For Too Much

sbatch --mem=400G my_wonderful.sh

sbatch: error: Batch job submission failed: Memory required by task is not available

Jupyter Notebook

Please refer to 📑 Jobs Conda JupyterLab as it is a bit more involved

WARNING: Don't let your worker node run if you are not using it

Conclusion

We want to maximize everyone enjoyment of the cluster
We want to maximize the cluster usage
This good-citizen principle doesn't apply only to Trixie but to all clusters

Trixie

Be a Good Cluster-Citizen

Samuel Larkin

What do we want to accomplish?

SLURM commands

Trixie

Getting some Software

/usr/share/Modules/modulefiles

/usr/share/modulefiles

/gpfs/share/Modules/modulefiles

/gpfs/share/rhel9/opt/spack/share/spack/modules/linux-rhel9-skylake_avx512

Partitions, JobTesting

Account Name

Node's Resources

What Do Nodes Have to Offer?

The GPUs

CPUs

Slurm Header Example

Resources request

Automatic requeueing
`# Requeueing on Trixie function _requeue { echo "~requeueing $SLURM_JOBID" date scontrol requeue $SLURM_JOBID } if [[ -n "$SLURM_JOBID" ]]; then trap _requeue USR1 fi`

Do I Really Need a Script?

Set the Expected RAM Limit

Make your Job Resumable

Submit my Job

queue your job

check that your job is running

Check `nvidia-smi -l` for Good GPU Usage

Asking For Too Much

Jupyter Notebook

Conclusion

Links

Excercise?

Trixie

Be a Good Cluster-Citizen

Samuel Larkin

What do we want to accomplish?

SLURM commands

Trixie

Getting some Software

/usr/share/Modules/modulefiles

/usr/share/modulefiles

/gpfs/share/Modules/modulefiles

/gpfs/share/rhel9/opt/spack/share/spack/modules/linux-rhel9-skylake_avx512

Partitions, JobTesting

Account Name

Node's Resources

What Do Nodes Have to Offer?

The GPUs

CPUs

Slurm Header Example

Resources request

Automatic requeueing # Requeueing on Trixie function _requeue { echo "~requeueing $SLURM_JOBID" date scontrol requeue $SLURM_JOBID } if [[ -n "$SLURM_JOBID" ]]; then trap _requeue USR1 fi

Do I Really Need a Script?

Set the Expected RAM Limit

Make your Job Resumable

Submit my Job

queue your job

check that your job is running

Check nvidia-smi -l for Good GPU Usage

Asking For Too Much

Jupyter Notebook

Conclusion

Links

Excercise?

Automatic requeueing
`# Requeueing on Trixie function _requeue { echo "~requeueing $SLURM_JOBID" date scontrol requeue $SLURM_JOBID } if [[ -n "$SLURM_JOBID" ]]; then trap _requeue USR1 fi`

Check `nvidia-smi -l` for Good GPU Usage