Spencer,

Thank you for your response!

It does appear that the memory allocation was the issue. When I specify 
--mem=1, I am able to queue jobs on a single node.

That being said, I was under the impression that the DefMemPerCPU, 
DefMemPerNode (what sbatch claims to default to), etc. values defaulted to 0 
which was interpreted as unlimited. I understood this to mean that the 
job/task, when not explicitly defining a memory request, had unlimited access 
to the memory resource. I’m assuming that’s incorrect? Is this possibly related 
to the scheduler configuration I have defined (making cores AND memory 
consumable resources):

SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory,CR_CORE_DEFAULT_DIST_BLOCK


Thank you again for the help!

Jason Dana
JHUAPL
REDD/RA2
Lead Systems Administrator/Software Engineer
jason.d...@jhuapl.edu<mailto:jason.d...@jhuapl.edu>
240-564-1045 (w)

Need Support from REDD?  You can enter a ticket using the new REDD Help Desk 
Portal (https://help.rcs.jhuapl.edu<https://help.rcs.jhuapl.edu/>) if you have 
an active account or e-mail 
redd-h...@outermail.jhuapl.edu<mailto:redd-h...@outermail.jhuapl.edu>.

From: slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of Spencer 
Bliven <spencer.bli...@psi.ch>
Reply-To: Slurm User Community List <slurm-users@lists.schedmd.com>
Date: Tuesday, September 1, 2020 at 5:11 PM
To: Slurm User Community List <slurm-users@lists.schedmd.com>
Subject: [EXT] Re: [slurm-users] Question/Clarification: Batch array multiple 
tasks on nodes

APL external email warning: Verify sender slurm-users-boun...@lists.schedmd.com 
before clicking links or attachments



Jason,

The array jobs are designed to behave like independent jobs (but are stored 
more efficiently internally to avoid straining the controller). So in principle 
slurm could schedule them one per node or multiple per node. The --nodes and 
--ntasks parameters apply to individual jobs in the array; thus setting 
--nodes=1 would definitely force jobs to run on different nodes.

The fact that they queue when forced to a single node is suspicious. Maybe you 
set up the partition as --exclusive? Or maybe jobs are requesting some other 
limited resource (e.g. if DefMemPerCPU is set to all the memory) preventing 
slurm from scheduling them simultaneously. If you're struggling with the array 
syntax, try just submitting two jobs to the same node and checking that you can 
get them to run simultaneously.

Best of luck,
-Spencer



On 1 September 2020 at 18:50:30, Dana, Jason T. 
(jason.d...@jhuapl.edu<mailto:jason.d...@jhuapl.edu>) wrote:
Hello,

I am new to Slurm and I am working on setting up a cluster. I am testing out 
running a batch execution using an array and am seeing only one task executed 
in the array per node. Even if I specify in the sbatch command that only one 
node should be used, it executes a single task on each of the available nodes 
in the partition. I was under the impression that it would continue to execute 
tasks until the resources on the node or for the user were at their limit. Am I 
missing something or have I misinterpreted how sbatch and/or the job scheduling 
should work?

Here is one of the commands I have run:

sbatch --array=0-15 --partition=htc-amd --wrap 'python3 -c "import time; 
print(\"working\"); time.sleep(5)"'

The htc-amd partition has 8 nodes and the results of this command are a single 
task being run on each node while the others are queued waiting for them to 
finish. As I mentioned before, if I specify --nodes=1 it will still execute a 
single task on every node in the partition. The only way I have gotten it to 
use on a single node was to use --nodelist, which worked but only to execute a 
single task and queued the rest. I have also tried specifying --ntasks and 
--ntasks-per-node. It appears to reserve resources, as I can cause it to hit 
the QOS core/cpu limit, but it does not affect the number of tasks executed on 
each node.

Thank you for any help you can offer!

Jason

Reply via email to