Slurm Tips

This blog records some tips about using Slurm.

1. How to set `cpus-per-task` and `OMP_NUM_THREADS` ?

The cpus-per-task option in SLURM (–cpus-per-task or -c) specifies how many CPU cores each task in your job should be allocated. Here’s how to understand it:

A “task” in SLURM typically corresponds to one process
cpus-per-task tells SLURM how many CPU cores that single process needs
This is primarily used for multithreaded applications (like OpenMP programs)

In SLURM, cpus-per-task typically refers to logical cores (hyperthreads), not physical cores. Here’s the breakdown:

SLURM counts logical cores by default
On a hyperthreaded system, each physical core appears as 2 logical cores
So –cpus-per-task=8 usually means 8 logical cores (which could be 4 physical cores with hyperthreading)

For hyperthreaded systems, the optimal settings depend on your workload type and performance goals. Here are the common approaches:

Approach 1: Use Physical Cores Only (Recommended for CPU-intensive work)

sbatch --cpus-per-task=8
export OMP_NUM_THREADS=4  # Half of cpus-per-task

This is because --cpus-per-task=8 will reserve 8 logical cores, which could be 4 physical cores. CPU-intensive programs often perform better using one thread per physical core rather than fighting for resources between hyperthreads.

Approach 2: Use All Logical Cores (Good for I/O or memory-bound work)

sbatch --cpus-per-task=8
export OMP_NUM_THREADS=8  # Matches cpus-per-task

This is because memory-bound or I/O-intensive work can benefit from hyperthreading since threads may be waiting anyway.

2. How to set `OMP_PROC_BIND` and `OMP_PLACES` ?

These two are OpenMP environment variables that control thread affinity - how your parallel threads are assigned to specific CPU cores or processing units.

OMP_PROC_BIND controls whether and how threads are bound to processors:

false (default): Threads can migrate between processors freely
true: Threads are bound to processors, but the specific binding pattern depends on the implementation
master/primary: All threads are bound to the same processor as the primary thread
close: Threads are bound to processors close to the primary thread
spread: Threads are bound and are distributed as evenly as possible across available processors

OMP_PLACES defines the set of processors available for thread placement:

threads: Each place corresponds to a single hardware thread (logical thread)
cores: Each place corresponds to a single core (all threads on that core)
sockets: Each place corresponds to a single processor socket
{0,1,2,3}: Explicit list of processor IDs
{0:4}: Range notation (processors 0 through 3)
{0:4:2}: Range with stride (processors 0 and 2)

If you care about performance, try the following combined configurations

# Maybe the best if leveraging compiler/runtime knowledge of your hardware
OMP_PROC_BIND=true
OMP_PLACES=threads

# Should be generally good since each thread owns a full core, less contention
OMP_PROC_BIND=spread
OMP_PLACES=cores

# Should be good if using hyperthreading when beneficial
OMP_PROC_BIND=spread
OMP_PLACES=threads

About This Blog

Categories

Tags

Slurm Tips

1. How to set `cpus-per-task` and `OMP_NUM_THREADS` ?

2. How to set `OMP_PROC_BIND` and `OMP_PLACES` ?

About This Blog

Categories

Tags

1. How to set cpus-per-task and OMP_NUM_THREADS ?

2. How to set OMP_PROC_BIND and OMP_PLACES ?

1. How to set `cpus-per-task` and `OMP_NUM_THREADS` ?

2. How to set `OMP_PROC_BIND` and `OMP_PLACES` ?