Please watch the Introduction to Slurm Tutorials - https://slurm.schedmd.com/tutorials.html
The system scheduler is currently configured as one job per node.
Running Your Jobs
(Quick start) Starting an interactive session using srun
To get started quickly, from a login node:
[username@l001 ~]$ srun --pty bash -i [username@c001 ~]$
You will notice that the host has changed from l001 to c001, indicating that you are now on a compute node and ready to do HPC work. Your job will expire after an hour, however, unless you specify a longer run time. Also, keep in mind that interactive time will be charged against your account. Also, keep in mind that if INCLINE is running a lot of jobs, you may have to wait a long time before your interactive job becomes available.
For more options and examples on how to use srun to run an interactive job, see https://slurm.schedmd.com/srun.html
Starting an independent interactive session using salloc
This command allocates a node, or collection of nodes, for your use. Basic usage:
[username@l001 ~]$ salloc salloc: Granted job allocation 888 [username@l001 ~]$
Notice that you are still on the login node, even though the job is now running. Use squeue to determine what node your job is running on:
[username@l001 ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 888 compute interact username R 1:20 1 c022
Under the NODELIST is appears that your job is running on compute node 22. You should now have permission to ssh into this node directly:
[username@l001 ~]$ ssh c022
You should receive the usual INCLINE welcome message, and then the prompt
[username@c022 ~]$
indicating you are now on c002. You can now run jobs as you normally would. So why use salloc? The use of an ssh connection allows you to perform port forwarding, which is useful if you want to use jupyter notebooks.
(Recommended) Starting a batch job using squeue
Interactive jobs are good for testing and development, but production type jobs should be submitted using the squeue command to submit a run script. This will put your job in the scheduler and will automatically start it as soon as it can.
Suppose you have a code stored in /home/username/my-big-code that you need to run on four nodes with 512 processors. You expect it to take about 8 hours to complete. Begin by creating the following script:
#!/bin/bash # The following are bash comments, but will be interpreted by SLURM as parameters. #SBATCH -J MyBigJob # job name #SBATCH -o /mmfs1/home/username/output_%j_stdout # print the job output to this file, where %j will be the job ID #SBATCH -N 4 # run on 4 nodes #SBATCH -n 512 # run with 512 MPI tasks #SBATCH -t 8:00:00 # run for 8 hours # Make sure to load/unload any modules that are needed during runtime. For instance, if you need mvapich instead of openmpi: module swap openmpi4 mvapich2/2.3.4 # Now perform the actual run. Recommend using mpiexec with no -np specifications - it will automatically use all of the available processors mpiexec /home/username/my-big-code /home/username/inputfile --output-file=/mmfs1/home/username/output_${SLURM_JOB_ID}
Note that the specific commands run by mpiexec are illustrative - you will need to run this however you normally would execute your code.
Once your script is ready, you can submit your job to slurm:
[username@l001 ~]$ sbatch my_run_script.sh Submitted job 881
This submits your job and gives it an ID, in this case, it is job number 881. You can monitor the status of the job:
[username@l001 ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 881 compute MyBigJob username R 1:20 1 c004-8
You can also check the output of your job by looking at the output file you specified in your script. To get real-time output, you can do the following
[username@l001 ~]$ tail -f /mmfs1/home/username/output_881_stdout
to get real-time output of your code by tracking the output file.
INCLINE Partitions
Status | Partition Name | Access | Resources | Max # nodes | Max time | Current Priority Job Factor (higher number = higher priority) | Description |
---|---|---|---|---|---|---|---|
Enabled | compute | All | Compute nodes | 26 | 24h | 2 | This is the standard workhorse category of partitions for most HPC codes. Jobs submitted to these queues are reasonably high priority, but have a 24 hour time limit. |
Enabled | gpu | All | GPU nodes | 2 | 24h | 2 | |
Enabled | bigmem | All | High memory nodes | 2 | 24h | 2 | |
Disabled | compute-quick | All | Compute nodes | 2 | 1h | 3 | These partitions are for testing or debugging. Submitting to these queues gets your code running quickly. |
Disabled | gpu-quick | All | GPU nodes | 1 | 1h | 3 | |
Disabled | bigmem-quick | All | High memory nodes | 1 | 1h | 3 | |
Disabled | compute-long | All | Compute nodes | 13 | 720h | 1 | Use these partitions for long-time jobs that are expected to take multiple days or even weeks. These partitions are low priority but have a long runtime. |
Disabled | gpu-long | All | GPU nodes | 1 | 720h | 1 | |
Disabled | bigmem-long | All | High memory nodes | 1 | 720h | 1 | |
Disabled | compute-unlimited | Privileged | Compute nodes only | 26 | Unlimited | 100 | These partitions are high-priority, unlimited queues accessible to privileged users only. Use of these queues is available by special request only. The unlimited queues are used for ultra-large-scale production runs, benchmarking tests, etc. |
Disabled | gpu-unlimited | Privileged | GPU nodes only | 2 | Unlimited | 100 | |
Disabled | bigmem-unlimited | Privileged | High memory nodes only | 2 | Unlimited | 100 | |
Disabled | compute-USER | USER | Compute nodes | N | Unlimited | 100 | These partitions are for users who are the owners of individual nodes on INCLINE. For instance, if bobsmith is a PI who has paid to purchase a compute node, then compute-bobsmith is a special high-priority queue accessible to him and his designated users only. |
Disabled | gpu-USER | USER | GPU nodes | N | Unlimited | 100 | |
Disabled | bibmem-USER | USER | High memory nodes | N | Unlimited | 100 |
For detailed discussion of SLURM prioritization and fairshare algorithm, see this presentation.