Dear Open MPI experts,

I have a problem that is related to the integration of OpenMPI, slurm and PMI 
interface. I spent some time today with a colleague of mine trying to figure 
out why we were not able to obtain all H5 profile files (generated by 
acct_gather_profile) using Open MPI. When I say "all" I mean if I run using 8 
nodes (e.g. tesla[121-128]) then I always systematically miss the file related 
to the first one (the first node in the allocation list, in this case tesla121).

By comparing which processes are spawn on the compute nodes, I discovered that 
mpirun running on tesla121 calls srun only to spawn remotely new MPI processes 
to the other 7 nodes (maybe this is obvious, for me it was not)...

fs395      617  0.0  0.0 106200  1504 ?        S    22:41   0:00 /bin/bash 
/var/spool/slurm-test/slurmd/job390044/slurm_script
fs395      629  0.1  0.0 194552  5288 ?        Sl   22:41   0:00  \_ mpirun 
-bind-to socket --map-by ppr:1:socket --host 
tesla121,tesla122,tesla123,tesla124,tesla125,tesla126,tes
fs395      632  0.0  0.0 659740  9148 ?        Sl   22:41   0:00  |   \_ srun 
--ntasks-per-node=1 --kill-on-bad-exit --cpu_bind=none --nodes=7 
--nodelist=tesla122,tesla123,tesla1
fs395      633  0.0  0.0  55544  1072 ?        S    22:41   0:00  |   |   \_ 
srun --ntasks-per-node=1 --kill-on-bad-exit --cpu_bind=none --nodes=7 
--nodelist=tesla122,tesla123,te
fs395      651  0.0  0.0 106072  1392 ?        S    22:41   0:00  |   \_ 
/bin/bash ./run_linpack ./xhpl
fs395      654  295 35.5 120113412 23289280 ?  RLl  22:41   3:12  |   |   \_ 
./xhpl
fs395      652  0.0  0.0 106072  1396 ?        S    22:41   0:00  |   \_ 
/bin/bash ./run_linpack ./xhpl
fs395      656  307 35.5 120070332 23267728 ?  RLl  22:41   3:19  |       \_ 
./xhpl


The "xhpl" processes allocated on the first node of a job are not called by 
srun and because of this the SLURM profile plugin is not activated on the 
node!!! As result I always miss the first node profile information. Intel MPI 
does not have this behavior, mpiexec.hydra uses srun on the first node. 

I got to the conclusion that SLURM is configured properly, something is wrong 
in the way I lunch Open MPI using mpirun. If I disable SLURM support and I 
revert back to rsh (--mca plm rsh) everything work but there is not profiling 
because the SLURM plug-in is not activated. During the configure step, Open MPI 
1.8.1 detects slurm and libmpi/libpmi2 correctly. Honestly, I would prefer to 
avoid to use srun as job luncher if possible...

Any suggestion to get this sorted out is really appreciated!

Best Regards,
Filippo

--
Mr. Filippo SPIGA, M.Sc.
http://filippospiga.info ~ skype: filippo.spiga

«Nobody will drive us out of Cantor's paradise.» ~ David Hilbert

*****
Disclaimer: "Please note this message and any attachments are CONFIDENTIAL and 
may be privileged or otherwise protected from disclosure. The contents are not 
to be disclosed to anyone other than the addressee. Unauthorized recipients are 
requested to preserve this confidentiality and to advise the sender immediately 
of any error in transmission."


Reply via email to