Re: [OMPI users] Error with multiple MPI runs inside one Slurm allocation (with QLogic PSM)

Gutierrez, Samuel K Mon, 2 Apr 2012 12:20:55 -0400

Sorry to hijack the thread, but I have a question regarding the failed PSM 
initialization.


Some of our users oversubscribe a node with multiple mpiruns in order to run 
their regression tests.  Recently, a user reported the same "Could not detect 
network connectivity" error.

My question:  is there a way to allow this type of behavior?  That is, 
oversubscribe a node with multiple mpiruns.  For example, say I have a node 
with 16 processing elements and I want to run 8 instances of "mpirun -n 3 
mpi_foo" on a single node simultaneously and don't care about performance.

Please note that oversubscription within one node and a **single** mpirun works 
as expected.  The error only shows up when another mpirun wants to join the 
party.

Thanks,

Lost in Los Alamos


On Apr 2, 2012, at 9:40 AM, Ralph Castain wrote:

> I'm not sure the 1.4 series can support that behavior. Each mpirun only knows 
> about itself - it has no idea something else is going on.
> 
> If you attempted to bind, all procs of same rank from each run would bind on 
> the same CPU.
> 
> All you can really do is use -host to tell the fourth run not to use the 
> first node. Or use the devel trunk, which has more ability to separate runs.
> 
> Sent from my iPad
> 
> On Apr 2, 2012, at 6:53 AM, Rémi Palancher <r...@rezib.org> wrote:
> 
>> Hi there,
>> 
>> I'm encountering a problem when trying to run multiple mpirun in parallel 
>> inside
>> one SLURM allocation on multiple nodes using a QLogic interconnect network 
>> with
>> PSM.
>> 
>> I'm using Open MPI version 1.4.5 compiled with GCC 4.4.5 on Debian Lenny.
>> 
>> My cluster is composed of 12 cores nodes.
>> 
>> Here is how I'm able to reproduce the problem:
>> 
>> Allocate 20 CPU on 2 nodes :
>> 
>> frontend $ salloc -N 2 -n 20
>> frontend $ srun hostname | sort | uniq -c
>>    12 cn1381
>>     8 cn1382
>> 
>> My job allocates 12 CPU on node cn1381 and 8 CPU on cn1382.
>> 
>> My test MPI program parse for each task the value of Cpus_allowed_list in 
>> file
>> /proc/$PID/status and print it.
>> 
>> If I run it on all 20 allocated CPU, it works well:
>> 
>> frontend $ mpirun get-allowed-cpu-ompi 1
>> Launch 1 Task 00 of 20 (cn1381): 0
>> Launch 1 Task 01 of 20 (cn1381): 1
>> Launch 1 Task 02 of 20 (cn1381): 2
>> Launch 1 Task 03 of 20 (cn1381): 3
>> Launch 1 Task 04 of 20 (cn1381): 4
>> Launch 1 Task 05 of 20 (cn1381): 7
>> Launch 1 Task 06 of 20 (cn1381): 5
>> Launch 1 Task 07 of 20 (cn1381): 9
>> Launch 1 Task 08 of 20 (cn1381): 8
>> Launch 1 Task 09 of 20 (cn1381): 10
>> Launch 1 Task 10 of 20 (cn1381): 6
>> Launch 1 Task 11 of 20 (cn1381): 11
>> Launch 1 Task 12 of 20 (cn1382): 4
>> Launch 1 Task 13 of 20 (cn1382): 5
>> Launch 1 Task 14 of 20 (cn1382): 6
>> Launch 1 Task 15 of 20 (cn1382): 7
>> Launch 1 Task 16 of 20 (cn1382): 8
>> Launch 1 Task 17 of 20 (cn1382): 10
>> Launch 1 Task 18 of 20 (cn1382): 9
>> Launch 1 Task 19 of 20 (cn1382): 11
>> 
>> Here you can see that Slurm gave me CPU 0-11 on cn1381 and 4-11 on cn1382.
>> 
>> Now I'd like to run multiple MPI runs in parallel, 4 tasks each, inside my 
>> job.
>> 
>> frontend $ cat params.txt
>> 1
>> 2
>> 3
>> 4
>> 5
>> 
>> It works well when I launch 3 runs in parallel, where it only use the 12 CPU 
>> of
>> the first node (3 runs x 4 tasks = 12 CPU):
>> 
>> frontend $ xargs -P 3 -n 1 mpirun -n 4 get-allowed-cpu-ompi < params.txt
>> Launch 2 Task 00 of 04 (cn1381): 1
>> Launch 2 Task 01 of 04 (cn1381): 2
>> Launch 2 Task 02 of 04 (cn1381): 4
>> Launch 2 Task 03 of 04 (cn1381): 7
>> Launch 1 Task 00 of 04 (cn1381): 0
>> Launch 1 Task 01 of 04 (cn1381): 3
>> Launch 1 Task 02 of 04 (cn1381): 5
>> Launch 1 Task 03 of 04 (cn1381): 6
>> Launch 3 Task 00 of 04 (cn1381): 9
>> Launch 3 Task 01 of 04 (cn1381): 8
>> Launch 3 Task 02 of 04 (cn1381): 10
>> Launch 3 Task 03 of 04 (cn1381): 11
>> Launch 4 Task 00 of 04 (cn1381): 0
>> Launch 4 Task 01 of 04 (cn1381): 3
>> Launch 4 Task 02 of 04 (cn1381): 1
>> Launch 4 Task 03 of 04 (cn1381): 5
>> Launch 5 Task 00 of 04 (cn1381): 2
>> Launch 5 Task 01 of 04 (cn1381): 4
>> Launch 5 Task 02 of 04 (cn1381): 7
>> Launch 5 Task 03 of 04 (cn1381): 6
>> 
>> But when I try to launch 4 runs or more in parallel, where it needs to use 
>> the
>> CPU of the other node as well, it fails:
>> 
>> frontend $ $ xargs -P 4 -n 1 mpirun -n 4 get-allowed-cpu-ompi < params.txt
>> cn1381.23245ipath_userinit: assign_context command failed: Network is down
>> cn1381.23245can't open /dev/ipath, network down (err=26)
>> --------------------------------------------------------------------------
>> PSM was unable to open an endpoint. Please make sure that the network link is
>> active on the node and the hardware is functioning.
>> 
>> Error: Could not detect network connectivity
>> --------------------------------------------------------------------------
>> cn1381.23248ipath_userinit: assign_context command failed: Network is down
>> cn1381.23248can't open /dev/ipath, network down (err=26)
>> --------------------------------------------------------------------------
>> PSM was unable to open an endpoint. Please make sure that the network link is
>> active on the node and the hardware is functioning.
>> 
>> Error: Could not detect network connectivity
>> --------------------------------------------------------------------------
>> cn1381.23247ipath_userinit: assign_context command failed: Network is down
>> cn1381.23247can't open /dev/ipath, network down (err=26)
>> cn1381.23249ipath_userinit: assign_context command failed: Network is down
>> cn1381.23249can't open /dev/ipath, network down (err=26)
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems.  This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>> 
>> PML add procs failed
>> --> Returned "Error" (-1) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** The MPI_Init() function was called before MPI_INIT was invoked.
>> *** This is disallowed by the MPI standard.
>> *** Your MPI job will now abort.
>> *** The MPI_Init() function was called before MPI_INIT was invoked.
>> *** This is disallowed by the MPI standard.
>> *** Your MPI job will now abort.
>> *** The MPI_Init() function was called before MPI_INIT was invoked.
>> *** This is disallowed by the MPI standard.
>> *** Your MPI job will now abort.
>> [cn1381:23245] Abort before MPI_INIT completed successfully; not able to 
>> guarantee that all other processes were killed!
>> *** The MPI_Init() function was called before MPI_INIT was invoked.
>> *** This is disallowed by the MPI standard.
>> *** Your MPI job will now abort.
>> [cn1381:23247] Abort before MPI_INIT completed successfully; not able to 
>> guarantee that all other processes were killed!
>> [cn1381:23242] Abort before MPI_INIT completed successfully; not able to 
>> guarantee that all other processes were killed!
>> [cn1381:23243] Abort before MPI_INIT completed successfully; not able to 
>> guarantee that all other processes were killed!
>> --------------------------------------------------------------------------
>> mpirun has exited due to process rank 2 with PID 23245 on
>> node cn1381 exiting without calling "finalize". This may
>> have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems.  This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>> 
>> PML add procs failed
>> --> Returned "Error" (-1) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** The MPI_Init() function was called before MPI_INIT was invoked.
>> *** This is disallowed by the MPI standard.
>> *** Your MPI job will now abort.
>> *** The MPI_Init() function was called before MPI_INIT was invoked.
>> *** This is disallowed by the MPI standard.
>> *** Your MPI job will now abort.
>> [cn1381:23246] Abort before MPI_INIT completed successfully; not able to 
>> guarantee that all other processes were killed!
>> *** The MPI_Init() function was called before MPI_INIT was invoked.
>> *** This is disallowed by the MPI standard.
>> *** Your MPI job will now abort.
>> [cn1381:23248] Abort before MPI_INIT completed successfully; not able to 
>> guarantee that all other processes were killed!
>> *** The MPI_Init() function was called before MPI_INIT was invoked.
>> *** This is disallowed by the MPI standard.
>> *** Your MPI job will now abort.
>> [cn1381:23249] Abort before MPI_INIT completed successfully; not able to 
>> guarantee that all other processes were killed!
>> [cn1381:23244] Abort before MPI_INIT completed successfully; not able to 
>> guarantee that all other processes were killed!
>> --------------------------------------------------------------------------
>> mpirun has exited due to process rank 2 with PID 23248 on
>> node cn1381 exiting without calling "finalize". This may
>> have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>> --------------------------------------------------------------------------
>> [ivanoe1:24981] 3 more processes have sent help message help-mtl-psm.txt / 
>> unable to open endpoint
>> [ivanoe1:24981] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
>> help / error messages
>> [ivanoe1:24981] 3 more processes have sent help message help-mpi-runtime / 
>> mpi_init:startup:internal-failure
>> [ivanoe1:24983] 3 more processes have sent help message help-mtl-psm.txt / 
>> unable to open endpoint
>> [ivanoe1:24983] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
>> help / error messages
>> [ivanoe1:24983] 3 more processes have sent help message help-mpi-runtime / 
>> mpi_init:startup:internal-failure
>> Launch 3 Task 00 of 04 (cn1381): 0
>> Launch 3 Task 01 of 04 (cn1381): 1
>> Launch 3 Task 02 of 04 (cn1381): 2
>> Launch 3 Task 03 of 04 (cn1381): 3
>> Launch 1 Task 00 of 04 (cn1381): 4
>> Launch 1 Task 01 of 04 (cn1381): 5
>> Launch 1 Task 02 of 04 (cn1381): 6
>> Launch 1 Task 03 of 04 (cn1381): 8
>> Launch 5 Task 00 of 04 (cn1381): 7
>> Launch 5 Task 01 of 04 (cn1381): 9
>> Launch 5 Task 02 of 04 (cn1381): 10
>> Launch 5 Task 03 of 04 (cn1381): 11
>> 
>> As far as I can understand, Open MPI tries to launch all runs on the same 
>> nodes
>> (cn1382 in my case) and it forgets about the other node. Am I right? How can 
>> I
>> avoid this behaviour?
>> 
>> Here are the Open MPI variables set in my environment:
>> $ env | grep OMPI
>> OMPI_MCA_mtl=psm
>> OMPI_MCA_pml=cm
>> 
>> You can find attached to this email the config.log and the output of the
>> following commands:
>> frontend $ ompi_info --all > ompi_info_all.txt
>> frontend $ mpirun --bynode --npernode 1 --tag-output ompi_info -v ompi full \
>>          --parsable > ompi_nodes.txt
>> 
>> Thanks in advance for any kind of help!
>> 
>> Best regards,
>> -- 
>> Rémi Palancher
>> http://rezib.org
>> <config.log.gz>
>> <ompi_info_all.txt.gz>
>> <ompi_nodes.txt>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Error with multiple MPI runs inside one Slurm allocation (with QLogic PSM)

Reply via email to