Here it is:

$ 
LD_PRELOAD=/mnt/data/users/dm2/vol3/semenov/_scratch/mxm/mxm-3.0/lib/libmxm.so  
mpirun  -x LD_PRELOAD --mca plm_base_verbose 10 --debug-daemons -np 1 hello_c

[access1:29064] mca: base: components_register: registering plm components
[access1:29064] mca: base: components_register: found loaded component isolated
[access1:29064] mca: base: components_register: component isolated has no 
register or open function
[access1:29064] mca: base: components_register: found loaded component rsh
[access1:29064] mca: base: components_register: component rsh register function 
successful
[access1:29064] mca: base: components_register: found loaded component slurm
[access1:29064] mca: base: components_register: component slurm register 
function successful
[access1:29064] mca: base: components_open: opening plm components
[access1:29064] mca: base: components_open: found loaded component isolated
[access1:29064] mca: base: components_open: component isolated open function 
successful
[access1:29064] mca: base: components_open: found loaded component rsh
[access1:29064] mca: base: components_open: component rsh open function 
successful
[access1:29064] mca: base: components_open: found loaded component slurm
[access1:29064] mca: base: components_open: component slurm open function 
successful
[access1:29064] mca:base:select: Auto-selecting plm components
[access1:29064] mca:base:select:(  plm) Querying component [isolated]
[access1:29064] mca:base:select:(  plm) Query of component [isolated] set 
priority to 0
[access1:29064] mca:base:select:(  plm) Querying component [rsh]
[access1:29064] mca:base:select:(  plm) Query of component [rsh] set priority 
to 10
[access1:29064] mca:base:select:(  plm) Querying component [slurm]
[access1:29064] mca:base:select:(  plm) Query of component [slurm] set priority 
to 75
[access1:29064] mca:base:select:(  plm) Selected component [slurm]
[access1:29064] mca: base: close: component isolated closed
[access1:29064] mca: base: close: unloading component isolated
[access1:29064] mca: base: close: component rsh closed
[access1:29064] mca: base: close: unloading component rsh
Daemon was launched on node1-128-17 - beginning to initialize
Daemon was launched on node1-128-18 - beginning to initialize
Daemon [[63607,0],2] checking in as pid 24538 on host node1-128-18
[node1-128-18:24538] [[63607,0],2] orted: up and running - waiting for commands!
Daemon [[63607,0],1] checking in as pid 17192 on host node1-128-17
[node1-128-17:17192] [[63607,0],1] orted: up and running - waiting for commands!
srun: error: node1-128-18: task 1: Exited with exit code 1
srun: Terminating job step 645191.1
srun: error: node1-128-17: task 0: Exited with exit code 1
--------------------------------------------------------------------------
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpirun. This could be caused by a number
of factors, including an inability to create a connection back
to mpirun due to a lack of common network interfaces and/or no
route found between them. Please check network connectivity
(including firewalls and network routing requirements).
--------------------------------------------------------------------------
[access1:29064] [[63607,0],0] orted_cmd: received halt_vm cmd
[access1:29064] mca: base: close: component slurm closed
[access1:29064] mca: base: close: unloading component slurm


Wed, 16 Jul 2014 14:20:33 +0300 от Mike Dubman <mi...@dev.mellanox.co.il>:
>please add following flags to mpirun "--mca plm_base_verbose 10 
>--debug-daemons" and attach output.
>Thx
>
>
>On Wed, Jul 16, 2014 at 11:12 AM, Timur Ismagilov  < tismagi...@mail.ru > 
>wrote:
>>Hello!
>>I have Open MPI v1.9a1r32142 and slurm 2.5.6.
>>
>>I can not use mpirun after salloc:
>>
>>$salloc -N2 --exclusive -p test -J ompi
>>$LD_PRELOAD=/mnt/data/users/dm2/vol3/semenov/_scratch/mxm/mxm-3.0/lib/libmxm.so
>> mpirun -np 1 hello_c
>>-----------------------------------------------------------------------------------------------------
>>An ORTE daemon has unexpectedly failed after launch and before
>>communicating back to mpirun. This could be caused by a number
>>of factors, including an inability to create a connection back
>>to mpirun due to a lack of common network interfaces and/or no
>>route found between them. Please check network connectivity
>>(including firewalls and network routing requirements).
>>------------------------------------------------------------------------------------------------------
>>But if i use mpirun in sbutch script it looks correct:
>>$cat ompi_mxm3.0
>>#!/bin/sh
>>LD_PRELOAD=/mnt/data/users/dm2/vol3/semenov/_scratch/mxm/mxm-3.0/lib/libmxm.so
>>  mpirun  -x LD_PRELOAD -x MXM_SHM_KCOPY_MODE=off --map-by slot:pe=8 "$@"
>>
>>$sbatch -N2  --exclusive -p test -J ompi  ompi_mxm3.0 ./hello_c
>>Submitted batch job 645039
>>$cat slurm-645039.out 
>>[warn] Epoll ADD(1) on fd 0 failed.  Old events were 0; read change was 1 
>>(add); write change was 0 (none): Operation not permitted
>>[warn] Epoll ADD(4) on fd 1 failed.  Old events were 0; read change was 0 
>>(none); write change was 1 (add): Operation not permitted
>>Hello, world, I am 0 of 2, (Open MPI v1.9a1, package: Open MPI 
>>semenov@compiler-2 Distribution, ident: 1.9a1r32142, repo rev: r32142, Jul 
>>04, 2014 (nightly snapshot tarball), 146)
>>Hello, world, I am 1 of 2, (Open MPI v1.9a1, package: Open MPI 
>>semenov@compiler-2 Distribution, ident: 1.9a1r32142, repo rev: r32142, Jul 
>>04, 2014 (nightly snapshot tarball), 146)
>>
>>Regards,
>>Timur
>>_______________________________________________
>>users mailing list
>>us...@open-mpi.org
>>Subscription:  http://www.open-mpi.org/mailman/listinfo.cgi/users
>>Link to this post:  
>>http://www.open-mpi.org/community/lists/users/2014/07/24777.php
>




Reply via email to