This blog summerize some general method for installing and configuring Slurm on a single node (Debian systems).

Installation

First, update software repository and install Slurm and its dependencies:

sudo apt update
sudo apt upgrade -y
# if necessary
sudo reboot 

sudo apt install slurm-wlm slurmd slurmctld -y

Configuration

Getting hostname:

hostname

Create the main configuration file at /etc/slurm/slurm.conf:

We use the example configuration file and revise them slightly

sudo cp /usr/share/doc/slurmctld/examples/slurm.conf.simple.gz /tmp/
sudo gunzip /tmp/slurm.conf.simple.gz
sudo mv /tmp/slurm.conf.simple /etc/slurm/slurm.conf

Edit /etc/slurm/slurm.conf with your settings. For a single-node setup, here’s a minimal configuration/revision:

ClusterName=localcluster
SlurmctldHost=<hostname>

ProctrackType=proctrack/cgroup

# By default is `/var/run/slurmctld.pid`
# But `/var/run` is typically a symlink to `/run`
SlurmctldPidFile=/run/slurmctld.pid
SlurmdPidFile=/run/slurmd.pid

# It is recommended to stack task/cgroup,task/affinity
# and setting ConstrainCores=yes in cgroup.conf
TaskPlugin=task/cgroup,task/affinity

# Cores and memory are consumable resources.
SelectTypeParameters=CR_Core_Memory

# Change the number of CPUs and Memory
NodeName=<hostname> CPUs=44 RealMemory=141000

Create necessary directories

# on Debian systems
sudo mkdir -p /var/lib/slurm/slurmd
sudo mkdir -p /var/lib/slurm/slurmctld
sudo chown slurm:slurm /var/lib/slurm/slurmd
sudo chown slurm:slurm /var/lib/slurm/slurmctld

Create cgroup configuration

Create /etc/slurm/cgroup.conf and add:

CgroupAutomount=yes
ConstrainCores=yes
ConstrainRAMSpace=yes

Start services and Verify installation

sudo systemctl enable slurmctld
sudo systemctl enable slurmd
sudo systemctl start slurmctld
sudo systemctl start slurmd

Check that services are running:

sudo systemctl status slurmctld
sudo systemctl status slurmd

Check cluster status:

sinfo
squeue

You should see your partition and node listed. If the node shows as “down” or “drain”, you may need to set it to idle:

sudo scontrol update nodename=localhost state=idle

Restart services

If you make any changes of configuration files such as slurm.conf or cgroup.conf

sudo systemctl restart slurmctld
sudo systemctl restart slurmd

Operation

When we install Slurm on a single node, that one node acts as both the controller and the compute node simultaneously.

Submit jobs like any Slurm cluster

# Batch job
sbatch myjob.sh

# Allocate resources
salloc -n 1 -c 32