Hi there,

I'm encountering a problem when trying to run multiple mpirun in parallel inside one SLURM allocation on multiple nodes using a QLogic interconnect network with
PSM.

I'm using Open MPI version 1.4.5 compiled with GCC 4.4.5 on Debian Lenny.

My cluster is composed of 12 cores nodes.

Here is how I'm able to reproduce the problem:

Allocate 20 CPU on 2 nodes :

frontend $ salloc -N 2 -n 20
frontend $ srun hostname | sort | uniq -c
     12 cn1381
      8 cn1382

My job allocates 12 CPU on node cn1381 and 8 CPU on cn1382.

My test MPI program parse for each task the value of Cpus_allowed_list in file
/proc/$PID/status and print it.

If I run it on all 20 allocated CPU, it works well:

frontend $ mpirun get-allowed-cpu-ompi 1
Launch 1 Task 00 of 20 (cn1381): 0
Launch 1 Task 01 of 20 (cn1381): 1
Launch 1 Task 02 of 20 (cn1381): 2
Launch 1 Task 03 of 20 (cn1381): 3
Launch 1 Task 04 of 20 (cn1381): 4
Launch 1 Task 05 of 20 (cn1381): 7
Launch 1 Task 06 of 20 (cn1381): 5
Launch 1 Task 07 of 20 (cn1381): 9
Launch 1 Task 08 of 20 (cn1381): 8
Launch 1 Task 09 of 20 (cn1381): 10
Launch 1 Task 10 of 20 (cn1381): 6
Launch 1 Task 11 of 20 (cn1381): 11
Launch 1 Task 12 of 20 (cn1382): 4
Launch 1 Task 13 of 20 (cn1382): 5
Launch 1 Task 14 of 20 (cn1382): 6
Launch 1 Task 15 of 20 (cn1382): 7
Launch 1 Task 16 of 20 (cn1382): 8
Launch 1 Task 17 of 20 (cn1382): 10
Launch 1 Task 18 of 20 (cn1382): 9
Launch 1 Task 19 of 20 (cn1382): 11

Here you can see that Slurm gave me CPU 0-11 on cn1381 and 4-11 on cn1382.

Now I'd like to run multiple MPI runs in parallel, 4 tasks each, inside my job.

frontend $ cat params.txt
1
2
3
4
5

It works well when I launch 3 runs in parallel, where it only use the 12 CPU of
the first node (3 runs x 4 tasks = 12 CPU):

frontend $ xargs -P 3 -n 1 mpirun -n 4 get-allowed-cpu-ompi < params.txt
Launch 2 Task 00 of 04 (cn1381): 1
Launch 2 Task 01 of 04 (cn1381): 2
Launch 2 Task 02 of 04 (cn1381): 4
Launch 2 Task 03 of 04 (cn1381): 7
Launch 1 Task 00 of 04 (cn1381): 0
Launch 1 Task 01 of 04 (cn1381): 3
Launch 1 Task 02 of 04 (cn1381): 5
Launch 1 Task 03 of 04 (cn1381): 6
Launch 3 Task 00 of 04 (cn1381): 9
Launch 3 Task 01 of 04 (cn1381): 8
Launch 3 Task 02 of 04 (cn1381): 10
Launch 3 Task 03 of 04 (cn1381): 11
Launch 4 Task 00 of 04 (cn1381): 0
Launch 4 Task 01 of 04 (cn1381): 3
Launch 4 Task 02 of 04 (cn1381): 1
Launch 4 Task 03 of 04 (cn1381): 5
Launch 5 Task 00 of 04 (cn1381): 2
Launch 5 Task 01 of 04 (cn1381): 4
Launch 5 Task 02 of 04 (cn1381): 7
Launch 5 Task 03 of 04 (cn1381): 6

But when I try to launch 4 runs or more in parallel, where it needs to use the
CPU of the other node as well, it fails:

frontend $ $ xargs -P 4 -n 1 mpirun -n 4 get-allowed-cpu-ompi < params.txt cn1381.23245ipath_userinit: assign_context command failed: Network is down
cn1381.23245can't open /dev/ipath, network down (err=26)
--------------------------------------------------------------------------
PSM was unable to open an endpoint. Please make sure that the network link is
active on the node and the hardware is functioning.

  Error: Could not detect network connectivity
--------------------------------------------------------------------------
cn1381.23248ipath_userinit: assign_context command failed: Network is down
cn1381.23248can't open /dev/ipath, network down (err=26)
--------------------------------------------------------------------------
PSM was unable to open an endpoint. Please make sure that the network link is
active on the node and the hardware is functioning.

  Error: Could not detect network connectivity
--------------------------------------------------------------------------
cn1381.23247ipath_userinit: assign_context command failed: Network is down
cn1381.23247can't open /dev/ipath, network down (err=26)
cn1381.23249ipath_userinit: assign_context command failed: Network is down
cn1381.23249can't open /dev/ipath, network down (err=26)
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
*** The MPI_Init() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
*** The MPI_Init() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
*** The MPI_Init() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[cn1381:23245] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
*** The MPI_Init() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[cn1381:23247] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! [cn1381:23242] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! [cn1381:23243] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
mpirun has exited due to process rank 2 with PID 23245 on
node cn1381 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
*** The MPI_Init() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
*** The MPI_Init() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[cn1381:23246] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
*** The MPI_Init() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[cn1381:23248] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
*** The MPI_Init() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[cn1381:23249] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! [cn1381:23244] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
mpirun has exited due to process rank 2 with PID 23248 on
node cn1381 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[ivanoe1:24981] 3 more processes have sent help message help-mtl-psm.txt / unable to open endpoint [ivanoe1:24981] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [ivanoe1:24981] 3 more processes have sent help message help-mpi-runtime / mpi_init:startup:internal-failure [ivanoe1:24983] 3 more processes have sent help message help-mtl-psm.txt / unable to open endpoint [ivanoe1:24983] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages [ivanoe1:24983] 3 more processes have sent help message help-mpi-runtime / mpi_init:startup:internal-failure
Launch 3 Task 00 of 04 (cn1381): 0
Launch 3 Task 01 of 04 (cn1381): 1
Launch 3 Task 02 of 04 (cn1381): 2
Launch 3 Task 03 of 04 (cn1381): 3
Launch 1 Task 00 of 04 (cn1381): 4
Launch 1 Task 01 of 04 (cn1381): 5
Launch 1 Task 02 of 04 (cn1381): 6
Launch 1 Task 03 of 04 (cn1381): 8
Launch 5 Task 00 of 04 (cn1381): 7
Launch 5 Task 01 of 04 (cn1381): 9
Launch 5 Task 02 of 04 (cn1381): 10
Launch 5 Task 03 of 04 (cn1381): 11

As far as I can understand, Open MPI tries to launch all runs on the same nodes (cn1382 in my case) and it forgets about the other node. Am I right? How can I
avoid this behaviour?

Here are the Open MPI variables set in my environment:
$ env | grep OMPI
OMPI_MCA_mtl=psm
OMPI_MCA_pml=cm

You can find attached to this email the config.log and the output of the
following commands:
frontend $ ompi_info --all > ompi_info_all.txt
frontend $ mpirun --bynode --npernode 1 --tag-output ompi_info -v ompi full \
           --parsable > ompi_nodes.txt

Thanks in advance for any kind of help!

Best regards,
--
Rémi Palancher
http://rezib.org

Attachment: config.log.gz
Description: GNU Zip compressed data

Attachment: ompi_info_all.txt.gz
Description: GNU Zip compressed data

[1,0]<stdout>:package:Open MPI root@ivanoe1 Distribution
[1,0]<stdout>:ompi:version:full:1.4.5
[1,0]<stdout>:ompi:version:svn:r25905
[1,0]<stdout>:ompi:version:release_date:Feb 10, 2012
[1,0]<stdout>:orte:version:full:1.4.5
[1,0]<stdout>:orte:version:svn:r25905
[1,0]<stdout>:orte:version:release_date:Feb 10, 2012
[1,0]<stdout>:opal:version:full:1.4.5
[1,0]<stdout>:opal:version:svn:r25905
[1,0]<stdout>:opal:version:release_date:Feb 10, 2012
[1,0]<stdout>:ident:1.4.5
[1,1]<stdout>:package:Open MPI root@ivanoe1 Distribution
[1,1]<stdout>:ompi:version:full:1.4.5
[1,1]<stdout>:ompi:version:svn:r25905
[1,1]<stdout>:ompi:version:release_date:Feb 10, 2012
[1,1]<stdout>:orte:version:full:1.4.5
[1,1]<stdout>:orte:version:svn:r25905
[1,1]<stdout>:orte:version:release_date:Feb 10, 2012
[1,1]<stdout>:opal:version:full:1.4.5
[1,1]<stdout>:opal:version:svn:r25905
[1,1]<stdout>:opal:version:release_date:Feb 10, 2012
[1,1]<stdout>:ident:1.4.5

Reply via email to