> -----Original Message-----
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
> Behalf Of Jeff Squyres
> Sent: Thursday, February 24, 2011 10:20 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] SLURM environment variables at runtime
> 
> On Feb 24, 2011, at 11:15 AM, Henderson, Brent wrote:
> 
> > Note that the parent of the sleep processes is orted and that orted
> was started by slurmstepd.  Unless orted is updating the slurm
> variables for the children (which is doubtful) then they will not
> contain the specific settings that I see when I run srun directly.
> 
> I'm not sure what you mean by that statement.  The orted passes its
> environment to its children; so whatever the slurm stepd set in the
> environment for the orted, the children should be getting.
> 

While you are correct the environment is inherited to the children, sometimes 
that does not make sense.  Take for example SLURM_PROCID.  If slurmstepd starts 
the orted and sets its SLURM_PROCID, then the children sleep processes (of 
orted) would get that as well exactly as it is in orted.  That is clearly 
misleading at best.  For example:

[brent@node2 mpi]$ mpirun -np 4 --bynode sleep 300

Then looking at the remote node:

[brent@node1 mpi]$ ps -fu brent
UID        PID  PPID  C STIME TTY          TIME CMD
brent     2853  2850  0 13:23 ?        00:00:00 
/mnt/node1/home/brent/bin/openmpi143/bin/orted -mca
brent     2856  2853  0 13:23 ?        00:00:00 sleep 300
brent     2857  2853  0 13:23 ?        00:00:00 sleep 300
(snip)

And the SLURM_PROCID from each process:

[brent@node1 mpi]$ perl -p -e 's/\0/\n/g' /proc/2853/environ | egrep ^SLURM_ | 
grep PROCID
SLURM_PROCID=0
[brent@node1 mpi]$ perl -p -e 's/\0/\n/g' /proc/2856/environ | egrep ^SLURM_ | 
grep PROCID
SLURM_PROCID=0
[brent@node1 mpi]$ perl -p -e 's/\0/\n/g' /proc/2857/environ | egrep ^SLURM_ | 
grep PROCID
SLURM_PROCID=0
[brent@node1 mpi]$

They really can't be all SLURM_PROCID=0 - that is supposed to be unique for the 
job - right?  It appears that the SLURM_PROCID is inherited from the orted 
parent - which makes a fair amount of sense given how things are launched.  If 
I use HP-MPI, the slurmstepd starts each of the sleep processes and it does set 
SLURM_PROCID uniquely when launching each child.  This is the crux of my issue.

I did find that there are OMPI_* variables that I can map internally back to 
what I think that the slurm variables should be:

[brent@node1 mpi]$ perl -p -e 's/\0/\n/g' /proc/2853/environ | egrep ^OMPI | 
grep WORLD
[brent@node1 mpi]$ perl -p -e 's/\0/\n/g' /proc/2856/environ | egrep ^OMPI | 
grep WORLD
OMPI_COMM_WORLD_SIZE=4
OMPI_COMM_WORLD_LOCAL_SIZE=2
OMPI_COMM_WORLD_RANK=1
OMPI_COMM_WORLD_LOCAL_RANK=0
[brent@node1 mpi]$ perl -p -e 's/\0/\n/g' /proc/2857/environ | egrep ^OMPI | 
grep WORLD
OMPI_COMM_WORLD_SIZE=4
OMPI_COMM_WORLD_LOCAL_SIZE=2
OMPI_COMM_WORLD_RANK=3
OMPI_COMM_WORLD_LOCAL_RANK=1
[brent@node1 mpi]$

So, I think if I combined some OMPI_* things with SLURM_* things, I should be 
o.k. for what I need.

Now to answer the other question - why are there some variables missing.  It 
appears that when the orted processes are launched - via srun but only one per 
node, it is a subset of the main allocation and thus some of the environment 
variables are not the same (or missing entirely) as compared to launching them 
directly with srun on the full allocation.  This also makes sense to me at some 
level, so I'm at peace with it now.  :)

> Clearly, something is different here -- maybe we do have a bug -- but
> as you stated below, why does it work for me?  Is SLURM 2.2.x the
> difference?  I don't know.
> 
I'm tempted to try the older version of slurm as this might be the cause of the 
missing environment variables, but that is an experiment for another day.  I'll 
see if I can make do with what I see currently.

> > Now, the question still is, why does this work for Jeff?  :)  Is
> there a way to get orted out of the way so the sleep processes are
> launched directly by srun?
> 
> Yes; see Ralph's prior mail about direct srun support in Open MPI
> 1.5.x.  You lose some functionality / features that way, though.
> 
Maybe that will be an answer, but I'll see if I can make things work with 1.4.3 
for now.

Last thing before I go.  Please let me apologize for not being clear on what I 
disagreed with Ralph about in my last note.  Clearly he nailed the orted 
launching process and spelled it out very clearly, but I don't believe that 
HP-MPI is not doing anything special to copy/fix up the SLURM environment 
variables.  Hopefully that was clear by the body of that message.  

I think we are done here as I think I can make something work with the various 
environment variables now.  Many thanks to Jeff and Ralph for their suggestions 
and insight on this issue!

Brent


Reply via email to