Deploy Slurm on Single Node
This blog summerize some general method for installing and configuring Slurm on a single node (Debian systems).
Installation
First, update software repository and install Slurm and its dependencies:
sudo apt update
sudo apt upgrade -y
# if necessary
sudo reboot
sudo apt install slurm-wlm slurmd slurmctld -y
Configuration
Getting hostname:
hostname
Create the main configuration file at /etc/slurm/slurm.conf:
We use the example configuration file and revise them slightly
sudo cp /usr/share/doc/slurmctld/examples/slurm.conf.simple.gz /tmp/
sudo gunzip /tmp/slurm.conf.simple.gz
sudo mv /tmp/slurm.conf.simple /etc/slurm/slurm.conf
Edit /etc/slurm/slurm.conf with your settings. For a single-node setup, here’s a minimal configuration/revision:
ClusterName=localcluster
SlurmctldHost=<hostname>
ProctrackType=proctrack/cgroup
# By default is `/var/run/slurmctld.pid`
# But `/var/run` is typically a symlink to `/run`
SlurmctldPidFile=/run/slurmctld.pid
SlurmdPidFile=/run/slurmd.pid
# It is recommended to stack task/cgroup,task/affinity
# and setting ConstrainCores=yes in cgroup.conf
TaskPlugin=task/cgroup,task/affinity
# Cores and memory are consumable resources.
SelectTypeParameters=CR_Core_Memory
# Change the number of CPUs and Memory
NodeName=<hostname> CPUs=44 RealMemory=141000
Create necessary directories
# on Debian systems
sudo mkdir -p /var/lib/slurm/slurmd
sudo mkdir -p /var/lib/slurm/slurmctld
sudo chown slurm:slurm /var/lib/slurm/slurmd
sudo chown slurm:slurm /var/lib/slurm/slurmctld
Create cgroup configuration
Create /etc/slurm/cgroup.conf and add:
CgroupAutomount=yes
ConstrainCores=yes
ConstrainRAMSpace=yes
Start services and Verify installation
sudo systemctl enable slurmctld
sudo systemctl enable slurmd
sudo systemctl start slurmctld
sudo systemctl start slurmd
Check that services are running:
sudo systemctl status slurmctld
sudo systemctl status slurmd
Check cluster status:
sinfo
squeue
You should see your partition and node listed. If the node shows as “down” or “drain”, you may need to set it to idle:
sudo scontrol update nodename=localhost state=idle
Restart services
If you make any changes of configuration files such as slurm.conf or cgroup.conf
sudo systemctl restart slurmctld
sudo systemctl restart slurmd
Operation
When we install Slurm on a single node, that one node acts as both the controller and the compute node simultaneously.
Submit jobs like any Slurm cluster
# Batch job
sbatch myjob.sh
# Allocate resources
salloc -n 1 -c 32