⚠️ I won’t introduce Slurm here as I already have written several notes about it (see the Slurm tag).
Today, I needed to send 171 jobs to the server. All jobs were similar similar, only the species ID was changing, exactly the kind of computation array jobs were designed for. As there are 32 cpus per nodes, I figured I needed 6 nodes. And so I wrote the following bash script:
|
|
But this was wrong because with this request SLURM allocates 6 nodes for
every task whereas I needed 6 nodes total. My mistake was to not realize that
jobs in a array job share all the #SBATCH
options (but the --array
one),
including the number of nodes. Now, what I have bear in my mind is that an array
job is a bunch of serial job with the same #SBATCH
options and SLURM will
handle cpus/nodes needed and dispatch jobs anywhere it can! This is way more
flexible than requesting nodes and this makes it quite powerful for the kind of
computation I do!
So all I needed was to remove the line #SBATCH --nodes=6nodes
|
|
And below is what the queue looked like!
|
|
A final note to mention that one can limit the number of tasks running simultaneously using %
, e.g. in my case #SBATCH --array=1-171%20
will limit the number of task running at once to 20, this can help managing the work load and also with %1
jobs will be run in order!
|
|
IMHO, the documentation available online is very clear about array job, I just needed to make this mistake to understand it 💯% 😆!