On Thu, 20 Mar 2008 16:40:41 -0600 Ralph Castain <r...@lanl.gov> wrote:
> I am no slurm expert. However, it is our understanding that > SLURM_TASKS_PER_NODE means the number of slots allocated to the job, > not the number of tasks to be executed on each node. So the 4(x2) > tells us that we have 4 slots on each of two nodes to work with. You > got 4 slots on each node because you used the -N option, which told > slurm to assign all slots on that node to this job - I assume you > have 4 processors on your nodes. OpenMPI parses that string to get > the allocation, then maps the number of specified processes against > it. That was also my interpretation and I was absolutely sure to have read it a couple of days ago in the srun man-page. In the meantime I changed my opinion because now it says "Number of tasks to be initiated on each node" as Tim has quoted. I've no idea, how Tim managed to change the man-page on my computer ;-) and there is another variable documented: SLURM_CPUS_ON_NODE Count of processors available to the job on this node. Note the select/linear plugin allocates entire nodes to jobs, so the value indicates the total count of CPUs on the node. The select/cons_res plugin allocates individual processors to jobs, so this number indicates the number of processors on this node allocated to the job. Anyway, back to reality: I made some further tests, and the only way to change the values of SLURM_TASKS_PER_NODE was to tell slurm that node x has only y cpus in slurm.conf. The variable documented as SLURM_CPUS_ON_NODE (in 1.0.15 and 1.2.22) doesn't seem to exist in either version. In 1.2.22 there seems to be SLURM_JOB_CPUS_PER_NODE which has the same value as SLURM_TASKS_PER_NODE. In a couple of days I'll try the other allocator plugin which allocates on a cpu base instead of a node base. And after that it probably would be a good idea, that somebody (me?) sums up our thread and asks the slurm guys for their opinion. > It is possible that the interpretation of SLURM_TASKS_PER_NODE is > different when used to allocate as opposed to directly launch > processes. Our typical usage is for someone to do: > > srun -N 2 -A > mpirun -np 2 helloworld > > In other words, we use srun to create an allocation, and then run > mpirun separately within it. > > > I am therefore unsure what the "-n 2" will do here. If I believe the > documentation, it would seem to imply that srun will attempt to > launch two copies of "mpirun -np 2 helloworld", yet your output > doesn't seem to support that interpretation. It would appear that the > "-n 2" is being ignored and only one copy of mpirun is being > launched. I'm no slurm expert, so perhaps that interpretation is > incorrect. That indeed happens when you call "srun -N 2 mpirun -np 2 helloworld", but "srun -N 2 -b mpirun -np 2 helloworld" submits it as a batch-job, i.e. "mpirun -np 2 helloworld" is executed only once on one of the allocated nodes and environment variables are set appropriately -- or at least should be set appropriately -- that a consecutive srun or an mpirun inside the command does the right thing. Werner