Sorry to hijack the thread, but I have a question regarding the failed PSM initialization.
Some of our users oversubscribe a node with multiple mpiruns in order to run their regression tests. Recently, a user reported the same "Could not detect network connectivity" error. My question: is there a way to allow this type of behavior? That is, oversubscribe a node with multiple mpiruns. For example, say I have a node with 16 processing elements and I want to run 8 instances of "mpirun -n 3 mpi_foo" on a single node simultaneously and don't care about performance. Please note that oversubscription within one node and a **single** mpirun works as expected. The error only shows up when another mpirun wants to join the party. Thanks, Lost in Los Alamos On Apr 2, 2012, at 9:40 AM, Ralph Castain wrote: > I'm not sure the 1.4 series can support that behavior. Each mpirun only knows > about itself - it has no idea something else is going on. > > If you attempted to bind, all procs of same rank from each run would bind on > the same CPU. > > All you can really do is use -host to tell the fourth run not to use the > first node. Or use the devel trunk, which has more ability to separate runs. > > Sent from my iPad > > On Apr 2, 2012, at 6:53 AM, Rémi Palancher <r...@rezib.org> wrote: > >> Hi there, >> >> I'm encountering a problem when trying to run multiple mpirun in parallel >> inside >> one SLURM allocation on multiple nodes using a QLogic interconnect network >> with >> PSM. >> >> I'm using Open MPI version 1.4.5 compiled with GCC 4.4.5 on Debian Lenny. >> >> My cluster is composed of 12 cores nodes. >> >> Here is how I'm able to reproduce the problem: >> >> Allocate 20 CPU on 2 nodes : >> >> frontend $ salloc -N 2 -n 20 >> frontend $ srun hostname | sort | uniq -c >> 12 cn1381 >> 8 cn1382 >> >> My job allocates 12 CPU on node cn1381 and 8 CPU on cn1382. >> >> My test MPI program parse for each task the value of Cpus_allowed_list in >> file >> /proc/$PID/status and print it. >> >> If I run it on all 20 allocated CPU, it works well: >> >> frontend $ mpirun get-allowed-cpu-ompi 1 >> Launch 1 Task 00 of 20 (cn1381): 0 >> Launch 1 Task 01 of 20 (cn1381): 1 >> Launch 1 Task 02 of 20 (cn1381): 2 >> Launch 1 Task 03 of 20 (cn1381): 3 >> Launch 1 Task 04 of 20 (cn1381): 4 >> Launch 1 Task 05 of 20 (cn1381): 7 >> Launch 1 Task 06 of 20 (cn1381): 5 >> Launch 1 Task 07 of 20 (cn1381): 9 >> Launch 1 Task 08 of 20 (cn1381): 8 >> Launch 1 Task 09 of 20 (cn1381): 10 >> Launch 1 Task 10 of 20 (cn1381): 6 >> Launch 1 Task 11 of 20 (cn1381): 11 >> Launch 1 Task 12 of 20 (cn1382): 4 >> Launch 1 Task 13 of 20 (cn1382): 5 >> Launch 1 Task 14 of 20 (cn1382): 6 >> Launch 1 Task 15 of 20 (cn1382): 7 >> Launch 1 Task 16 of 20 (cn1382): 8 >> Launch 1 Task 17 of 20 (cn1382): 10 >> Launch 1 Task 18 of 20 (cn1382): 9 >> Launch 1 Task 19 of 20 (cn1382): 11 >> >> Here you can see that Slurm gave me CPU 0-11 on cn1381 and 4-11 on cn1382. >> >> Now I'd like to run multiple MPI runs in parallel, 4 tasks each, inside my >> job. >> >> frontend $ cat params.txt >> 1 >> 2 >> 3 >> 4 >> 5 >> >> It works well when I launch 3 runs in parallel, where it only use the 12 CPU >> of >> the first node (3 runs x 4 tasks = 12 CPU): >> >> frontend $ xargs -P 3 -n 1 mpirun -n 4 get-allowed-cpu-ompi < params.txt >> Launch 2 Task 00 of 04 (cn1381): 1 >> Launch 2 Task 01 of 04 (cn1381): 2 >> Launch 2 Task 02 of 04 (cn1381): 4 >> Launch 2 Task 03 of 04 (cn1381): 7 >> Launch 1 Task 00 of 04 (cn1381): 0 >> Launch 1 Task 01 of 04 (cn1381): 3 >> Launch 1 Task 02 of 04 (cn1381): 5 >> Launch 1 Task 03 of 04 (cn1381): 6 >> Launch 3 Task 00 of 04 (cn1381): 9 >> Launch 3 Task 01 of 04 (cn1381): 8 >> Launch 3 Task 02 of 04 (cn1381): 10 >> Launch 3 Task 03 of 04 (cn1381): 11 >> Launch 4 Task 00 of 04 (cn1381): 0 >> Launch 4 Task 01 of 04 (cn1381): 3 >> Launch 4 Task 02 of 04 (cn1381): 1 >> Launch 4 Task 03 of 04 (cn1381): 5 >> Launch 5 Task 00 of 04 (cn1381): 2 >> Launch 5 Task 01 of 04 (cn1381): 4 >> Launch 5 Task 02 of 04 (cn1381): 7 >> Launch 5 Task 03 of 04 (cn1381): 6 >> >> But when I try to launch 4 runs or more in parallel, where it needs to use >> the >> CPU of the other node as well, it fails: >> >> frontend $ $ xargs -P 4 -n 1 mpirun -n 4 get-allowed-cpu-ompi < params.txt >> cn1381.23245ipath_userinit: assign_context command failed: Network is down >> cn1381.23245can't open /dev/ipath, network down (err=26) >> -------------------------------------------------------------------------- >> PSM was unable to open an endpoint. Please make sure that the network link is >> active on the node and the hardware is functioning. >> >> Error: Could not detect network connectivity >> -------------------------------------------------------------------------- >> cn1381.23248ipath_userinit: assign_context command failed: Network is down >> cn1381.23248can't open /dev/ipath, network down (err=26) >> -------------------------------------------------------------------------- >> PSM was unable to open an endpoint. Please make sure that the network link is >> active on the node and the hardware is functioning. >> >> Error: Could not detect network connectivity >> -------------------------------------------------------------------------- >> cn1381.23247ipath_userinit: assign_context command failed: Network is down >> cn1381.23247can't open /dev/ipath, network down (err=26) >> cn1381.23249ipath_userinit: assign_context command failed: Network is down >> cn1381.23249can't open /dev/ipath, network down (err=26) >> -------------------------------------------------------------------------- >> It looks like MPI_INIT failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during MPI_INIT; some of which are due to configuration or environment >> problems. This failure appears to be an internal failure; here's some >> additional information (which may only be relevant to an Open MPI >> developer): >> >> PML add procs failed >> --> Returned "Error" (-1) instead of "Success" (0) >> -------------------------------------------------------------------------- >> *** The MPI_Init() function was called before MPI_INIT was invoked. >> *** This is disallowed by the MPI standard. >> *** Your MPI job will now abort. >> *** The MPI_Init() function was called before MPI_INIT was invoked. >> *** This is disallowed by the MPI standard. >> *** Your MPI job will now abort. >> *** The MPI_Init() function was called before MPI_INIT was invoked. >> *** This is disallowed by the MPI standard. >> *** Your MPI job will now abort. >> [cn1381:23245] Abort before MPI_INIT completed successfully; not able to >> guarantee that all other processes were killed! >> *** The MPI_Init() function was called before MPI_INIT was invoked. >> *** This is disallowed by the MPI standard. >> *** Your MPI job will now abort. >> [cn1381:23247] Abort before MPI_INIT completed successfully; not able to >> guarantee that all other processes were killed! >> [cn1381:23242] Abort before MPI_INIT completed successfully; not able to >> guarantee that all other processes were killed! >> [cn1381:23243] Abort before MPI_INIT completed successfully; not able to >> guarantee that all other processes were killed! >> -------------------------------------------------------------------------- >> mpirun has exited due to process rank 2 with PID 23245 on >> node cn1381 exiting without calling "finalize". This may >> have caused other processes in the application to be >> terminated by signals sent by mpirun (as reported here). >> -------------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> It looks like MPI_INIT failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during MPI_INIT; some of which are due to configuration or environment >> problems. This failure appears to be an internal failure; here's some >> additional information (which may only be relevant to an Open MPI >> developer): >> >> PML add procs failed >> --> Returned "Error" (-1) instead of "Success" (0) >> -------------------------------------------------------------------------- >> *** The MPI_Init() function was called before MPI_INIT was invoked. >> *** This is disallowed by the MPI standard. >> *** Your MPI job will now abort. >> *** The MPI_Init() function was called before MPI_INIT was invoked. >> *** This is disallowed by the MPI standard. >> *** Your MPI job will now abort. >> [cn1381:23246] Abort before MPI_INIT completed successfully; not able to >> guarantee that all other processes were killed! >> *** The MPI_Init() function was called before MPI_INIT was invoked. >> *** This is disallowed by the MPI standard. >> *** Your MPI job will now abort. >> [cn1381:23248] Abort before MPI_INIT completed successfully; not able to >> guarantee that all other processes were killed! >> *** The MPI_Init() function was called before MPI_INIT was invoked. >> *** This is disallowed by the MPI standard. >> *** Your MPI job will now abort. >> [cn1381:23249] Abort before MPI_INIT completed successfully; not able to >> guarantee that all other processes were killed! >> [cn1381:23244] Abort before MPI_INIT completed successfully; not able to >> guarantee that all other processes were killed! >> -------------------------------------------------------------------------- >> mpirun has exited due to process rank 2 with PID 23248 on >> node cn1381 exiting without calling "finalize". This may >> have caused other processes in the application to be >> terminated by signals sent by mpirun (as reported here). >> -------------------------------------------------------------------------- >> [ivanoe1:24981] 3 more processes have sent help message help-mtl-psm.txt / >> unable to open endpoint >> [ivanoe1:24981] Set MCA parameter "orte_base_help_aggregate" to 0 to see all >> help / error messages >> [ivanoe1:24981] 3 more processes have sent help message help-mpi-runtime / >> mpi_init:startup:internal-failure >> [ivanoe1:24983] 3 more processes have sent help message help-mtl-psm.txt / >> unable to open endpoint >> [ivanoe1:24983] Set MCA parameter "orte_base_help_aggregate" to 0 to see all >> help / error messages >> [ivanoe1:24983] 3 more processes have sent help message help-mpi-runtime / >> mpi_init:startup:internal-failure >> Launch 3 Task 00 of 04 (cn1381): 0 >> Launch 3 Task 01 of 04 (cn1381): 1 >> Launch 3 Task 02 of 04 (cn1381): 2 >> Launch 3 Task 03 of 04 (cn1381): 3 >> Launch 1 Task 00 of 04 (cn1381): 4 >> Launch 1 Task 01 of 04 (cn1381): 5 >> Launch 1 Task 02 of 04 (cn1381): 6 >> Launch 1 Task 03 of 04 (cn1381): 8 >> Launch 5 Task 00 of 04 (cn1381): 7 >> Launch 5 Task 01 of 04 (cn1381): 9 >> Launch 5 Task 02 of 04 (cn1381): 10 >> Launch 5 Task 03 of 04 (cn1381): 11 >> >> As far as I can understand, Open MPI tries to launch all runs on the same >> nodes >> (cn1382 in my case) and it forgets about the other node. Am I right? How can >> I >> avoid this behaviour? >> >> Here are the Open MPI variables set in my environment: >> $ env | grep OMPI >> OMPI_MCA_mtl=psm >> OMPI_MCA_pml=cm >> >> You can find attached to this email the config.log and the output of the >> following commands: >> frontend $ ompi_info --all > ompi_info_all.txt >> frontend $ mpirun --bynode --npernode 1 --tag-output ompi_info -v ompi full \ >> --parsable > ompi_nodes.txt >> >> Thanks in advance for any kind of help! >> >> Best regards, >> -- >> Rémi Palancher >> http://rezib.org >> <config.log.gz> >> <ompi_info_all.txt.gz> >> <ompi_nodes.txt> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users