Trixie

Be a Good Cluster-Citizen

Samuel Larkin

Trixie

  • trixie.res.nrc.gc.ca from the black & orange networks
  • /gpfs/work/onboarding.sh
  • /home/${USER} has a quota of 50GB and has snapshots disabled.
  • /gpfs/work/${USER} is your primary work-space and has a quota of 500GB and 1M inodes with snapshots enabled
  • /gpfs/projects a common work-space
  • We request that users not create conda, mamba, venvs, etc within the /gpfs/work space, and instead use their $HOME and create symlinks to their projects. The snapshot feature of GPFS does not perform optimally when many small files are created and destroyed as with such utilities. We understand that some workflows on Trixie have this behaviour inherently, but an effort to reduce this behaviour is appreciated

Getting some Software

module avail

/usr/share/Modules/modulefiles

  • dot
  • module-git
  • module-info
  • modules
  • null
  • use.own

/usr/share/modulefiles

  • mpi/openmpi-x86_64

/gpfs/share/Modules/modulefiles

  • mathematica/14.0

/gpfs/share/rhel9/opt/spack/share/spack/modules/linux-rhel9-skylake_avx512

  • anaconda3/2023.09-0-gcc-11.3.1-4njg4u3
  • autoconf-archive/2023.02.20-gcc-11.3.1-x3v3e3k
  • autoconf/2.72-gcc-11.3.1-zigeprh
  • automake/1.16.5-gcc-11.3.1-fiei3qm
  • bdftopcf/1.1-gcc-11.3.1-pjei4lg
  • berkeley-db/18.1.40-gcc-11.3.1-imacxi5
  • binutils/2.42-gcc-11.3.1-k2azs5r
  • bison/3.8.2-gcc-11.3.1-5ymaspt
  • bzip2/1.0.8-gcc-11.3.1-jftbk52
  • ca-certificates-mozilla/2023-05-30-gcc-11.3.1-3q7ngqu
  • cmake/3.27.9-gcc-11.3.1-ddiy5al
  • cpio/2.15-gcc-11.3.1-gysadjq
  • curl/8.7.1-gcc-11.3.1-ct7pjm2
  • diffutils/3.10-gcc-11.3.1-i2xxusk
  • expat/2.6.2-gcc-11.3.1-i6fawb2
  • findutils/4.9.0-gcc-11.3.1-vrwwaoy
  • fixesproto/5.0-gcc-11.3.1-rzgygqz
  • flex/2.6.3-gcc-11.3.1-lvm5eg4
  • font-util/1.4.0-gcc-11.3.1-kdzp4kf
  • fontconfig/2.15.0-gcc-11.3.1-oknygy4
  • fontsproto/2.1.3-gcc-11.3.1-hnnw5wr
  • freeglut/3.2.2-gcc-11.3.1-bj2wnkh
  • freetype/2.13.2-gcc-11.3.1-mukuuf4
  • gawk/5.3.0-gcc-11.3.1-uuiqxd2
  • gcc-runtime/11.3.1-gcc-11.3.1-ts54e2r
  • gcc/13.2.0-gcc-11.3.1-5cmgvey
  • gdbm/1.23-gcc-11.3.1-hf42icl
  • gettext/0.22.5-gcc-11.3.1-acdclxk
  • glibc/2.34-gcc-11.3.1-rv4ofgg
  • glproto/1.4.17-gcc-11.3.1-xwh2ngh
  • glx/1.4-gcc-11.3.1-exlgy5j
  • gmake/4.4.1-gcc-11.3.1-jpbz7dw
  • gmp/6.2.1-gcc-11.3.1-j2qp7w7
  • gperf/3.1-gcc-11.3.1-umkvi7d
  • htop/3.2.2-gcc-11.3.1-oa7ttvf
  • hwloc/2.9.1-gcc-11.3.1-weybm2e
  • inputproto/2.3.2-gcc-11.3.1-bo7qinx
  • intel-mpi/2019.7.217-gcc-11.3.1-o2z3fjd
  • intel-oneapi-compilers-classic/2021.10.0-gcc-11.3.1-nocpr5m
  • intel-oneapi-compilers/2023.2.4-gcc-11.3.1-ddcm4wi
  • intel-oneapi-mpi/2021.12.1-gcc-11.3.1-4kgjxio
  • kbproto/1.0.7-gcc-11.3.1-6uzvk2c
  • libbsd/0.12.1-gcc-11.3.1-tca36ex
  • libedit/3.1-20230828-gcc-11.3.1-ysouu3l
  • libffi/3.4.6-gcc-11.3.1-clmfd4j
  • libfontenc/1.1.8-gcc-11.3.1-w44jsoi
  • libice/1.1.1-gcc-11.3.1-76rcd55
  • libiconv/1.17-gcc-11.3.1-sqkuf4t
  • libmd/1.0.4-gcc-11.3.1-2yooftr
  • libpciaccess/0.17-gcc-11.3.1-niczctf
  • libpng/1.2.57-gcc-11.3.1-oqex7om
  • libpthread-stubs/0.5-gcc-11.3.1-xfbnr2b
  • libsigsegv/2.14-gcc-11.3.1-5kireuc
  • libsm/1.2.4-gcc-11.3.1-sgdfk4e
  • libtool/2.4.7-gcc-11.3.1-gtphuga
  • libunwind/1.6.2-gcc-11.3.1-6mtzyno
  • libx11/1.8.7-gcc-11.3.1-d5xeazl
  • libxau/1.0.11-gcc-11.3.1-dkkx74b
  • libxcb/1.16-gcc-11.3.1-fulboi2
  • libxcrypt/4.4.35-gcc-11.3.1-7kd52bn
  • libxdmcp/1.1.4-gcc-11.3.1-7fe4723
  • libxext/1.3.5-gcc-11.3.1-7d2ci6d
  • libxfixes/5.0.3-gcc-11.3.1-k53yjzd
  • libxfont/1.5.4-gcc-11.3.1-etmwjjy
  • libxft/2.3.8-gcc-11.3.1-vwyuhmz
  • libxi/1.7.10-gcc-11.3.1-rgytoua
  • libxml2/2.10.3-gcc-11.3.1-e5zt4m2
  • libxrandr/1.5.4-gcc-11.3.1-qxmrns4
  • libxrender/0.9.11-gcc-11.3.1-ao2tic2
  • libxscrnsaver/1.2.4-gcc-11.3.1-zs6aqfi
  • libxt/1.3.0-gcc-11.3.1-lwme3qs
  • libxxf86vm/1.1.5-gcc-11.3.1-6pze4ei
  • llvm/17.0.6-gcc-11.3.1-xlwz53w
  • lua/5.3.6-gcc-11.3.1-hn2ac7j
  • lumerical/2019b-r2-gcc-11.3.1-ejm6mo6
  • lumerical/2021-R1.1-2599-gcc-11.3.1-vvyw6z6
  • m4/1.4.19-gcc-11.3.1-tncckyq
  • mesa-glu/9.0.1-gcc-11.3.1-glwxfkj
  • mesa-glu/9.0.2-gcc-11.3.1-bd2lm7d
  • mesa/23.3.6-gcc-11.3.1-mx6glm4
  • meson/1.3.2-gcc-11.3.1-w7cv5bi
  • mkfontdir/1.0.7-gcc-11.3.1-3xvst7e
  • mkfontscale/1.2.3-gcc-11.3.1-4vmnuo2
  • mpc/1.3.1-gcc-11.3.1-ifuk5gu
  • mpfr/4.2.1-gcc-11.3.1-i6gtxh6
  • ncurses/6.5-gcc-11.3.1-z54b6d4
  • nghttp2/1.57.0-gcc-11.3.1-7c6bk73
  • ninja/1.11.1-gcc-11.3.1-wcew5xr
  • openssl/3.3.0-gcc-11.3.1-y2icle5
  • parallel/20220522-gcc-11.3.1-juzq7oy
  • patchelf/0.17.2-gcc-11.3.1-t6bdsvg
  • pcre2/10.43-gcc-11.3.1-3y4lcyt
  • perl-data-dumper/2.173-gcc-11.3.1-xwk47wz
  • perl/5.38.0-gcc-11.3.1-4wtgagw
  • pigz/2.8-gcc-11.3.1-bswu4yx
  • pkgconf/2.2.0-gcc-11.3.1-reshpid
  • py-mako/1.2.4-gcc-11.3.1-qzty4ic
  • py-markupsafe/2.1.3-gcc-11.3.1-xwc652n
  • py-pip/23.1.2-gcc-11.3.1-bmelf66
  • py-setuptools/69.2.0-gcc-11.3.1-vjtezkl
  • py-wheel/0.41.2-gcc-11.3.1-k5uxoxo
  • python-venv/1.0-gcc-11.3.1-jtnjye4
  • python/3.10.13-gcc-11.3.1-73qnjae
  • python/3.11.7-gcc-11.3.1-hsfwjwr
  • python/3.12.1-gcc-11.3.1-7tuhjhr
  • python/3.9.18-gcc-11.3.1-tc7dyca
  • randrproto/1.5.0-gcc-11.3.1-b6ssnsk
  • re2c/2.2-gcc-11.3.1-vxkdcjq
  • readline/8.2-gcc-11.3.1-qu4mv62
  • renderproto/0.11.1-gcc-11.3.1-uhoybj4
  • scrnsaverproto/1.2.2-gcc-11.3.1-cw3khfj
  • sqlite/3.43.2-gcc-11.3.1-bodsqyx
  • swig/4.1.1-gcc-11.3.1-zaf4wit
  • tar/1.34-gcc-11.3.1-wcuczap
  • tcl/8.6.12-gcc-11.3.1-o3atwx3
  • texinfo/7.0.3-gcc-11.3.1-bhmkpv4
  • tk/8.6.11-gcc-11.3.1-wz3kgh7
  • unzip/6.0-gcc-11.3.1-kptkztv
  • util-linux-uuid/2.38.1-gcc-11.3.1-ekl7sit
  • util-macros/1.19.3-gcc-11.3.1-3ocj656
  • xcb-proto/1.16.0-gcc-11.3.1-scf7ljj
  • xextproto/7.3.0-gcc-11.3.1-6mytvtp
  • xf86vidmodeproto/2.3.1-gcc-11.3.1-gc624cq
  • xproto/7.0.31-gcc-11.3.1-hhddgfu
  • xrandr/1.5.2-gcc-11.3.1-nbhkevq
  • xtrans/1.5.0-gcc-11.3.1-67qr5rx
  • xz/5.4.6-gcc-11.3.1-pswkf4y
  • zlib-ng/2.1.6-gcc-11.3.1-snhalql
  • zlib/1.3.1-gcc-11.3.1-fk7flnb
  • zstd/1.5.6-gcc-11.3.1-mldt4gi

Key:

loaded auto-loaded modulepath

module load MODULE

Partitions, JobTesting

sinfo

PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
TrixieMain* up 12:00:00 4 drain cn[131-134]
TrixieMain* up 12:00:00 2 mix cn[108-109]
TrixieMain* up 12:00:00 22 idle cn[107,110-130]
TrixieLong up 2-00:00:00 1 drain cn131
TrixieLong up 2-00:00:00 2 mix cn[108-109]
TrixieLong up 2-00:00:00 22 idle cn[107,110-130]
JobTesting up 6:00:00 2 idle cn[135-136]

sbatch --partition=JobTesting ...

Account Name

See 📑 Account-Codes for a list of codes

DT Digital Technologies / Technologies Numériques

  • dt-dac
  • dt-dscs
  • dt-mtp
  • dt-ta

sbatch --account=account_code ...

Node's Resources

sinfo --Node --responding --long
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
cn106 1 DevTest idle 64 2:16:2 192777 0 1 (null) none
cn107 1 TrixieLong idle 64 2:16:2 192777 0 1 (null) none
cn107 1 TrixieMain* idle 64 2:16:2 192777 0 1 (null) none
cn108 1 TrixieLong idle 64 2:16:2 192777 0 1 (null) none
...

What Do Nodes Have to Offer?

scontrol show nodes

            
NodeName=cn136 Arch=x86_64 CoresPerSocket=16
  CPUAlloc=0 CPUTot=64 CPULoad=0.01
  AvailableFeatures=(null)
  ActiveFeatures=(null)
  Gres=gpu:4
  NodeAddr=cn136 NodeHostName=cn136
  OS=Linux 3.10.0-1160.62.1.el7.x86_64 #1 SMP Tue Apr 5 16:57:59 UTC 2022
  RealMemory=192777 AllocMem=0 FreeMem=183181 Sockets=2 Boards=1
  State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
  Partitions=JobTesting
  BootTime=2024-05-29T14:23:15 SlurmdStartTime=2024-05-29T14:23:36
  CfgTRES=cpu=64,mem=192777M,billing=64,gres/gpu=4
  AllocTRES=
  CapWatts=n/a
  CurrentWatts=0 AveWatts=0
  ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
            
          

The GPUs

  • Tesla V100-SXM2-32GB
  • May 10, 2017
  • NVIDIA-SMI 565.57.01
  • Driver Version: 565.57.01
  • CUDA Version: 12.7
  • CUDA Compute Capabilities 7.0
  • Single Precision: 15 TFLOPS
  • Tensor Performance (Deep Learning): 120 TFLOPS

CPUs

  • processor_type = Intel Xeon Gold 6130 CPU clocked at 2.1GHZ 16 cores / CPU
  • processors_per_node = 2
  • cores_per_socket = 16
  • threads_per_core = 2 (hyper-threading on)
  • RAM = 192 GB memory

Slurm Header Example


#!/bin/bash
# vim:nowrap:

#SBATCH --job-name=My_Wonderful
#SBATCH --comment="My Wonderful Script"

# On Trixie
#SBATCH --partition=TrixieMain
#SBATCH --account=dt-mtp

#SBATCH --gres=gpu:4
#SBATCH --time=12:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=6
#SBATCH --mem=96G
# To reserve a whole node for yourself
##SBATCH --exclusive
#SBATCH --open-mode=append
#SBATCH --requeue
#SBATCH --signal=B:USR1@30
#SBATCH --output=%x-%j.out
                  

# Requeueing on Trixie
function _requeue {
  echo "~requeueing $SLURM_JOBID"
  date
  scontrol requeue $SLURM_JOBID
}

if [[ -n "$SLURM_JOBID" ]]; then
  SACC_FORMAT="JobID,Submit,Start,End,Elapsed,ExitCode"
  SACC_FORMAT+=",State,CPUTime,MaxRSS,MaxVMSize"
  SACC_FORMAT+=",MaxDiskRead,MaxDiskWrite,AllocCPUs"
  SACC_FORMAT+=",AllocGRES,AllocTRES%-50,NodeList"
  SACC_FORMAT+=",JobName%-30,Comment%-80"
  trap "sacct --jobs $SLURM_JOBID --format=$SACC_FORMAT" 0
  trap _requeue USR1
fi
                  
sbatch my_wonderful.sh

Do I Really Need a Script?

Explicitly specifying all options at each invocation
            
sbatch \
    --job-name=My_Wonderful \
    --comment="My Wonderful Script" \
    --partition=TrixieMain \
    --account=dt-mtp \
    --gres=gpu:4 \
    --time=12:00:00 \
    --nodes=1 \
    --ntasks-per-node=4 \
    --cpus-per-task=6 \
    --mem=96G \
    --open-mode=append \
    --requeue \
    --signal=B:USR1@30 \
    --output=%x-%j.out \
    my_wonderful.sh args ...
            
          

😨

Overriding what is different

sbatch --job-name=OtherName my_wonderful.sh 😏

Set the Expected RAM Limit

#SBATCH --mem=96G

Otherwise the scheduler assumes that you want all of the memory which implies exclusive access to that node, preventing other jobs to use the remainder of the resources of that node.

Make your Job Resumable

            
#SBATCH --requeue
#SBATCH --signal=B:USR1@30

# Requeueing on Trixie
# [source](https://www.sherlock.stanford.edu/docs/user-guide/running-jobs/)
#
[source](https://hpc-uit.readthedocs.io/en/latest/jobs/examples.html)
function _requeue {
  echo "BASH - trapping signal 10 - requeueing $SLURM_JOBID"
  date
  scontrol requeue $SLURM_JOBID
}

if [[ -n "$SLURM_JOBID" ]]; then
  # Only if the job was submitted to SLURM.
  trap _requeue USR1
fi
            
          
  • Your code needs to be able to handle resuming from a previous checkpoint.
  • --signal=B:USR1@30 Ask the scheduler to send a USR1 signal 30 seconds before the time limit
  • trap _requeue USR1 Act on USR1 signal by calling _requeue()
  • 📑 Automatically Requeueing to Resume

Submit my Job

queue your job

sbatch my_wonderful.sh

check that your job is running

squeue

JOBID NAME USER ST TIME NODES NODELIST(REASON) SUBMIT_TIME COMMENT
733 My_Wonderful larkins R 7:43:44 1 trixie-cn101 2024-07-17T02:26:0 My Wonderful Script

Check nvidia-smi -l for Good GPU Usage

ssh -t cn101 nvidia-smi -l
            
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla V100-SXM2-32GB           On  |   00000000:89:00.0 Off |                    0 |
| N/A   54C    P0            139W /  300W |   28888MiB /  32768MiB |    100%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla V100-SXM2-32GB           On  |   00000000:8A:00.0 Off |                    0 |
| N/A   68C    P0            282W /  300W |   28846MiB /  32768MiB |     99%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  Tesla V100-SXM2-32GB           On  |   00000000:B2:00.0 Off |                    0 |
| N/A   58C    P0            289W /  300W |   28918MiB /  32768MiB |     99%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  Tesla V100-SXM2-32GB           On  |   00000000:B3:00.0 Off |                    0 |
| N/A   68C    P0            284W /  300W |   28918MiB /  32768MiB |     98%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A   3077932      C   ...C-Senate/nmt/tools/venv/bin/python3      28884MiB |
|    1   N/A  N/A   3077933      C   ...C-Senate/nmt/tools/venv/bin/python3      28842MiB |
|    2   N/A  N/A   3077934      C   ...C-Senate/nmt/tools/venv/bin/python3      28914MiB |
|    3   N/A  N/A   3077935      C   ...C-Senate/nmt/tools/venv/bin/python3      28914MiB |
+-----------------------------------------------------------------------------------------+
            
          

Asking For Too Much

sbatch --mem=400G my_wonderful.sh

sbatch: error: Batch job submission failed: Memory required by task is not available

Jupyter Notebook

Please refer to 📑 Jobs Conda JupyterLab as it is a bit more involved

WARNING: Don't let your worker node run if you are not using it

Conclusion

  • We want to maximize everyone enjoyment of the cluster
  • We want to maximize the cluster usage
  • This good-citizen principle doesn't apply only to Trixie but to all clusters

Links