Okay, I see what's going on here. The problem stems from a combination of two 
things:

1. your setup of the hostfile guarantees we will think there is only one slot 
on each host, even though Slurm will have assigned more. Is there some reason 
you are doing that? OMPI knows how to read the Slurm envars to get the 
allocation, so there is no reason to be creating a hostfile

2. you didn't specify how many processes to run. Looks like we have a bug for 
that case when you also specify pe=N and N > 1.

I can deal with the bug, but I suspect this will run just fine if you (a) let 
us just read the Slurm allocation ourselves, and (b) add "-np 1" to you cmd line


On Jul 4, 2014, at 8:19 AM, Timur Ismagilov <tismagi...@mail.ru> wrote:

> 1. Intell mpi is located here: /opt/intel/impi/4.1.0/intel64/lib. I have 
> added OMPI path at the start and got the same output.
> 2. here is my cmd line:
> 
> export OMP_NUM_THREADS=8; export 
> LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/mnt/data/users/dm2/vol3/semenov/_scratch/openmpi-1.9.0_mxm-3.0/lib;
>  sbatch -p test -t 5 --exclusive -N 1 -o ./results/hybrid-hello_omp$i.out -e 
> ./results/hybrid-hello_omp$i.err ompi_mxm3.0 ./hybrid-hello; done
> 
> $ cat ompi_mxm3.0
> 
> #!/bin/sh
> 
> #srun --resv-ports "$@"
> #exit $?
> 
> [ x"$TMPDIR" == x"" ] && TMPDIR=/tmp
> HOSTFILE=${TMPDIR}/hostfile.${SLURM_JOB_ID}
> srun hostname -s|sort|uniq -c|awk '{print $2" slots="$1}' > $HOSTFILE || { rm 
> -f $HOSTFILE; exit 255; }
> LD_PRELOAD=/mnt/data/users/dm2/vol3/semenov/_scratch/mxm/mxm-3.0/lib/libmxm.so
>  mpirun -x LD_PRELOAD -x MXM_SHM_KCOPY_MODE=off --map-by slot:pe=8 --mca 
> rmaps_base_verbose 20 --hostfile $HOSTFILE "$@"
> rc=$?
> rm -f $HOSTFILE
> 
> exit $rc
> 
> 
> 
> Fri, 4 Jul 2014 07:06:34 -0700 от Ralph Castain <r...@open-mpi.org>:
> 
> Hmmm...couple of things here:
> 
> 1. Intel packages Intel MPI in their compiler, and so there is in fact an 
> mpiexec and MPI libraries in your path before us. I would advise always 
> putting the OMPI path at the start of your path envars to avoid potential 
> conflict
> 
> 2. I'm having trouble understanding your command line because of all the 
> variable definitions. Could you please tell me what the mpirun cmd line is? I 
> suspect I know the problem, but need to see the actual cmd line to confirm it
> 
> Thanks
> Ralph
> 
> On Jul 4, 2014, at 1:38 AM, Timur Ismagilov <tismagi...@mail.ru> wrote:
> 
>> There is only one path to mpi lib.
>> echo $LD_LIBRARY_PATH 
>> /opt/intel/composer_xe_2013.2.146/mkl/lib/intel64:/opt/intel/composer_xe_2013.2.146/compiler/lib/intel64:/home/users/semenov/BFD/lib:/home/users/semenov/local/lib:/usr/lib64/:/mnt/data/users/dm2/vol3/semenov/_scratch/openmpi-1.9.0_mxm-3.0/lib
>> 
>> This one also looks correct.
>> $ldd hybrid-hello
>> 
>> linux-vdso.so.1 => (0x00007fff8b983000)
>> libmpi.so.0 => 
>> /mnt/data/users/dm2/vol3/semenov/_scratch/openmpi-1.9.0_mxm-3.0/lib/libmpi.so.0
>>  (0x00007f58c95cb000)
>> libm.so.6 => /lib64/libm.so.6 (0x000000338ac00000)
>> libiomp5.so => 
>> /opt/intel/composer_xe_2013.2.146/compiler/lib/intel64/libiomp5.so 
>> (0x00007f58c92a2000)
>> libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x000000338d400000)
>> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000000338cc00000)
>> libpthread.so.0 => /lib64/libpthread.so.0 (0x000000338b800000)
>> libc.so.6 => /lib64/libc.so.6 (0x000000338b000000)
>> libdl.so.2 => /lib64/libdl.so.2 (0x000000338b400000)
>> libopen-rte.so.0 => 
>> /mnt/data/users/dm2/vol3/semenov/_scratch/openmpi-1.9.0_mxm-3.0/lib/libopen-rte.so.0
>>  (0x00007f58c9009000)
>> libopen-pal.so.0 => 
>> /mnt/data/users/dm2/vol3/semenov/_scratch/openmpi-1.9.0_mxm-3.0/lib/libopen-pal.so.0
>>  (0x00007f58c8d05000)
>> libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007f58c8afb000)
>> librt.so.1 => /lib64/librt.so.1 (0x000000338c000000)
>> libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003393800000)
>> libutil.so.1 => /lib64/libutil.so.1 (0x000000339b600000)
>> libimf.so => 
>> /opt/intel/composer_xe_2013.2.146/compiler/lib/intel64/libimf.so 
>> (0x00007f58c863e000)
>> libsvml.so => 
>> /opt/intel/composer_xe_2013.2.146/compiler/lib/intel64/libsvml.so 
>> (0x00007f58c7c73000)
>> libirng.so => 
>> /opt/intel/composer_xe_2013.2.146/compiler/lib/intel64/libirng.so 
>> (0x00007f58c7a6b000)
>> libintlc.so.5 => 
>> /opt/intel/composer_xe_2013.2.146/compiler/lib/intel64/libintlc.so.5 
>> (0x00007f58c781d000)
>> /lib64/ld-linux-x86-64.so.2 (0x000000338a800000)
>> 
>> open mpi 1.5.5 was preinstalled to "/opt/mpi/openmpi-1.5.5-icc/".
>> 
>> Here is an output after adding "--mca rmaps_base_verbose 20" and "--map-by 
>> slot:pe=8".
>> outfile:
>> --------------------------------------------------------------------------
>> Your job failed to map. Either no mapper was available, or none
>> of the available mappers was able to perform the requested
>> mapping operation. This can happen if you request a map type
>> (e.g., loadbalance) and the corresponding mapper was not built.
>> --------------------------------------------------------------------------
>> 
>> 
>> errfile:
>> [node1-128-29:21477] mca: base: components_register: registering rmaps 
>> components
>> [node1-128-29:21477] mca: base: components_register: found loaded component 
>> lama
>> [node1-128-29:21477] mca:rmaps:lama: Priority 0
>> [node1-128-29:21477] mca:rmaps:lama: Map : NULL
>> [node1-128-29:21477] mca:rmaps:lama: Bind : NULL
>> [node1-128-29:21477] mca:rmaps:lama: MPPR : NULL
>> [node1-128-29:21477] mca:rmaps:lama: Order : NULL
>> [node1-128-29:21477] mca: base: components_register: component lama register 
>> function successful
>> [node1-128-29:21477] mca: base: components_register: found loaded component 
>> mindist
>> [node1-128-29:21477] mca: base: components_register: component mindist 
>> register function successful
>> [node1-128-29:21477] mca: base: components_register: found loaded component 
>> ppr
>> [node1-128-29:21477] mca: base: components_register: component ppr register 
>> function successful
>> [node1-128-29:21477] mca: base: components_register: found loaded component 
>> rank_file
>> [node1-128-29:21477] mca: base: components_register: component rank_file 
>> register function successful
>> [node1-128-29:21477] mca: base: components_register: found loaded component 
>> resilient
>> [node1-128-29:21477] mca: base: components_register: component resilient 
>> register function successful
>> [node1-128-29:21477] mca: base: components_register: found loaded component 
>> round_robin
>> [node1-128-29:21477] mca: base: components_register: component round_robin 
>> register function successful
>> [node1-128-29:21477] mca: base: components_register: found loaded component 
>> seq
>> [node1-128-29:21477] mca: base: components_register: component seq register 
>> function successful
>> [node1-128-29:21477] mca: base: components_register: found loaded component 
>> staged
>> [node1-128-29:21477] mca: base: components_register: component staged has no 
>> register or open function
>> [node1-128-29:21477] [[26215,0],0] rmaps:base set policy with slot:pe=8
>> [node1-128-29:21477] [[26215,0],0] rmaps:base policy slot modifiers pe=8 
>> provided
>> [node1-128-29:21477] [[26215,0],0] rmaps:base check modifiers with pe=8
>> [node1-128-29:21477] [[26215,0],0] rmaps:base setting pe/rank to 8
>> [node1-128-29:21477] mca: base: components_open: opening rmaps components
>> [node1-128-29:21477] mca: base: components_open: found loaded component lama
>> [node1-128-29:21477] mca: base: components_open: found loaded component 
>> mindist
>> [node1-128-29:21477] mca: base: components_open: component mindist open 
>> function successful
>> [node1-128-29:21477] mca: base: components_open: found loaded component ppr
>> [node1-128-29:21477] mca: base: components_open: component ppr open function 
>> successful
>> [node1-128-29:21477] mca: base: components_open: found loaded component 
>> rank_file
>> [node1-128-29:21477] mca: base: components_open: component rank_file open 
>> function successful
>> [node1-128-29:21477] mca: base: components_open: found loaded component 
>> resilient
>> [node1-128-29:21477] mca: base: components_open: component resilient open 
>> function successful
>> [node1-128-29:21477] mca: base: components_open: found loaded component 
>> round_robin
>> [node1-128-29:21477] mca: base: components_open: component round_robin open 
>> function successful
>> [node1-128-29:21477] mca: base: components_open: found loaded component seq
>> [node1-128-29:21477] mca: base: components_open: component seq open function 
>> successful
>> [node1-128-29:21477] mca: base: components_open: found loaded component 
>> staged
>> [node1-128-29:21477] mca: base: components_open: component staged open 
>> function successful
>> [node1-128-29:21477] mca:rmaps:select: checking available component lama
>> [node1-128-29:21477] mca:rmaps:select: Querying component [lama]
>> [node1-128-29:21477] mca:rmaps:select: checking available component mindist
>> [node1-128-29:21477] mca:rmaps:select: Querying component [mindist]
>> [node1-128-29:21477] mca:rmaps:select: checking available component ppr
>> [node1-128-29:21477] mca:rmaps:select: Querying component [ppr]
>> [node1-128-29:21477] mca:rmaps:select: checking available component rank_file
>> [node1-128-29:21477] mca:rmaps:select: Querying component [rank_file]
>> [node1-128-29:21477] mca:rmaps:select: checking available component resilient
>> [node1-128-29:21477] mca:rmaps:select: Querying component [resilient]
>> [node1-128-29:21477] mca:rmaps:select: checking available component 
>> round_robin
>> [node1-128-29:21477] mca:rmaps:select: Querying component [round_robin]
>> [node1-128-29:21477] mca:rmaps:select: checking available component seq
>> [node1-128-29:21477] mca:rmaps:select: Querying component [seq]
>> [node1-128-29:21477] mca:rmaps:select: checking available component staged
>> [node1-128-29:21477] mca:rmaps:select: Querying component [staged]
>> [node1-128-29:21477] [[26215,0],0]: Final mapper priorities
>> [node1-128-29:21477] Mapper: ppr Priority: 90
>> [node1-128-29:21477] Mapper: seq Priority: 60
>> [node1-128-29:21477] Mapper: resilient Priority: 40
>> [node1-128-29:21477] Mapper: mindist Priority: 20
>> [node1-128-29:21477] Mapper: round_robin Priority: 10
>> [node1-128-29:21477] Mapper: staged Priority: 5
>> [node1-128-29:21477] Mapper: lama Priority: 0
>> [node1-128-29:21477] Mapper: rank_file Priority: 0
>> [node1-128-29:21477] mca:rmaps: mapping job [26215,1]
>> [node1-128-29:21477] mca:rmaps: creating new map for job [26215,1]
>> [node1-128-29:21477] mca:rmaps: nprocs 0
>> [node1-128-29:21477] mca:rmaps mapping given - using default
>> [node1-128-29:21477] mca:rmaps:ppr: job [26215,1] not using ppr mapper
>> [node1-128-29:21477] mca:rmaps:seq: job [26215,1] not using seq mapper
>> [node1-128-29:21477] mca:rmaps:resilient: cannot perform initial map of job 
>> [26215,1] - no fault groups
>> [node1-128-29:21477] mca:rmaps:mindist: job [26215,1] not using mindist 
>> mapper
>> [node1-128-29:21477] mca:rmaps:rr: mapping job [26215,1]
>> [node1-128-29:21477] AVAILABLE NODES FOR MAPPING:
>> [node1-128-29:21477] node: node1-128-29 daemon: 0
>> [node1-128-29:21477] mca:rmaps:rr: mapping by slot for job [26215,1] slots 1 
>> num_procs 0
>> [node1-128-29:21477] mca:rmaps:rr:slot working node node1-128-29
>> [node1-128-29:21477] mca:rmaps:rr:slot assigning 0 procs to node node1-128-29
>> [node1-128-29:21477] mca:rmaps:base: computing vpids by slot for job 
>> [26215,1]
>> [node1-128-29:21477] mca: base: close: unloading component lama
>> [node1-128-29:21477] mca: base: close: component mindist closed
>> [node1-128-29:21477] mca: base: close: unloading component mindist
>> [node1-128-29:21477] mca: base: close: component ppr closed
>> [node1-128-29:21477] mca: base: close: unloading component ppr
>> [node1-128-29:21477] mca: base: close: component rank_file closed
>> [node1-128-29:21477] mca: base: close: unloading component rank_file
>> [node1-128-29:21477] mca: base: close: component resilient closed
>> [node1-128-29:21477] mca: base: close: unloading component resilient
>> [node1-128-29:21477] mca: base: close: component round_robin closed
>> [node1-128-29:21477] mca: base: close: unloading component round_robin
>> [node1-128-29:21477] mca: base: close: component seq closed
>> [node1-128-29:21477] mca: base: close: unloading component seq
>> [node1-128-29:21477] mca: base: close: component staged closed
>> [node1-128-29:21477] mca: base: close: unloading component staged
>> 
>> Here is an output after adding "--mca rmaps_base_verbose 20" and  WITHOUT 
>> "--map-by slot:pe=8".
>> outfile:
>> nothing
>> errfile:
>> [node1-128-29:21569] mca: base: components_register: registering rmaps 
>> components
>> [node1-128-29:21569] mca: base: components_register: found loaded component 
>> lama
>> [node1-128-29:21569] mca:rmaps:lama: Priority 0
>> [node1-128-29:21569] mca:rmaps:lama: Map : NULL
>> [node1-128-29:21569] mca:rmaps:lama: Bind : NULL
>> [node1-128-29:21569] mca:rmaps:lama: MPPR : NULL
>> [node1-128-29:21569] mca:rmaps:lama: Order : NULL
>> [node1-128-29:21569] mca: base: components_register: component lama register 
>> function successful
>> [node1-128-29:21569] mca: base: components_register: found loaded component 
>> mindist
>> [node1-128-29:21569] mca: base: components_register: component mindist 
>> register function successful
>> [node1-128-29:21569] mca: base: components_register: found loaded component 
>> ppr
>> [node1-128-29:21569] mca: base: components_register: component ppr register 
>> function successful
>> [node1-128-29:21569] mca: base: components_register: found loaded component 
>> rank_file
>> [node1-128-29:21569] mca: base: components_register: component rank_file 
>> register function successful
>> [node1-128-29:21569] mca: base: components_register: found loaded component 
>> resilient
>> [node1-128-29:21569] mca: base: components_register: component resilient 
>> register function successful
>> [node1-128-29:21569] mca: base: components_register: found loaded component 
>> round_robin
>> [node1-128-29:21569] mca: base: components_register: component round_robin 
>> register function successful
>> [node1-128-29:21569] mca: base: components_register: found loaded component 
>> seq
>> [node1-128-29:21569] mca: base: components_register: component seq register 
>> function successful
>> [node1-128-29:21569] mca: base: components_register: found loaded component 
>> staged
>> [node1-128-29:21569] mca: base: components_register: component staged has no 
>> register or open function
>> [node1-128-29:21569] [[25027,0],0] rmaps:base set policy with NULL
>> [node1-128-29:21569] mca: base: components_open: opening rmaps components
>> [node1-128-29:21569] mca: base: components_open: found loaded component lama
>> [node1-128-29:21569] mca: base: components_open: found loaded component 
>> mindist
>> [node1-128-29:21569] mca: base: components_open: component mindist open 
>> function successful
>> [node1-128-29:21569] mca: base: components_open: found loaded component ppr
>> [node1-128-29:21569] mca: base: components_open: component ppr open function 
>> successful
>> [node1-128-29:21569] mca: base: components_open: found loaded component 
>> rank_file
>> [node1-128-29:21569] mca: base: components_open: component rank_file open 
>> function successful
>> [node1-128-29:21569] mca: base: components_open: found loaded component 
>> resilient
>> [node1-128-29:21569] mca: base: components_open: component resilient open 
>> function successful
>> [node1-128-29:21569] mca: base: components_open: found loaded component 
>> round_robin
>> [node1-128-29:21569] mca: base: components_open: component round_robin open 
>> function successful
>> [node1-128-29:21569] mca: base: components_open: found loaded component seq
>> [node1-128-29:21569] mca: base: components_open: component seq open function 
>> successful
>> [node1-128-29:21569] mca: base: components_open: found loaded component 
>> staged
>> [node1-128-29:21569] mca: base: components_open: component staged open 
>> function successful
>> [node1-128-29:21569] mca:rmaps:select: checking available component lama
>> [node1-128-29:21569] mca:rmaps:select: Querying component [lama]
>> [node1-128-29:21569] mca:rmaps:select: checking available component mindist
>> [node1-128-29:21569] mca:rmaps:select: Querying component [mindist]
>> [node1-128-29:21569] mca:rmaps:select: checking available component ppr
>> [node1-128-29:21569] mca:rmaps:select: Querying component [ppr]
>> [node1-128-29:21569] mca:rmaps:select: checking available component rank_file
>> [node1-128-29:21569] mca:rmaps:select: Querying component [rank_file]
>> [node1-128-29:21569] mca:rmaps:select: checking available component resilient
>> [node1-128-29:21569] mca:rmaps:select: Querying component [resilient]
>> [node1-128-29:21569] mca:rmaps:select: checking available component 
>> round_robin
>> [node1-128-29:21569] mca:rmaps:select: Querying component [round_robin]
>> [node1-128-29:21569] mca:rmaps:select: checking available component seq
>> [node1-128-29:21569] mca:rmaps:select: Querying component [seq]
>> [node1-128-29:21569] mca:rmaps:select: checking available component staged
>> [node1-128-29:21569] mca:rmaps:select: Querying component [staged]
>> [node1-128-29:21569] [[25027,0],0]: Final mapper priorities
>> [node1-128-29:21569] Mapper: ppr Priority: 90
>> [node1-128-29:21569] Mapper: seq Priority: 60
>> [node1-128-29:21569] Mapper: resilient Priority: 40
>> [node1-128-29:21569] Mapper: mindist Priority: 20
>> [node1-128-29:21569] Mapper: round_robin Priority: 10
>> [node1-128-29:21569] Mapper: staged Priority: 5
>> [node1-128-29:21569] Mapper: lama Priority: 0
>> [node1-128-29:21569] Mapper: rank_file Priority: 0
>> [node1-128-29:21569] mca:rmaps: mapping job [25027,1]
>> [node1-128-29:21569] mca:rmaps: creating new map for job [25027,1]
>> [node1-128-29:21569] mca:rmaps: nprocs 0
>> [node1-128-29:21569] mca:rmaps mapping not given - using bycore
>> [node1-128-29:21569] mca:rmaps:ppr: job [25027,1] not using ppr mapper
>> [node1-128-29:21569] mca:rmaps:seq: job [25027,1] not using seq mapper
>> [node1-128-29:21569] mca:rmaps:resilient: cannot perform initial map of job 
>> [25027,1] - no fault groups
>> [node1-128-29:21569] mca:rmaps:mindist: job [25027,1] not using mindist 
>> mapper
>> [node1-128-29:21569] mca:rmaps:rr: mapping job [25027,1]
>> [node1-128-29:21569] AVAILABLE NODES FOR MAPPING:
>> [node1-128-29:21569] node: node1-128-29 daemon: 0
>> [node1-128-29:21569] mca:rmaps:rr: mapping no-span by Core for job [25027,1] 
>> slots 1 num_procs 1
>> [node1-128-29:21569] mca:rmaps:rr: found 8 Core objects on node node1-128-29
>> [node1-128-29:21569] mca:rmaps:rr: calculated nprocs 1
>> [node1-128-29:21569] mca:rmaps:rr: assigning nprocs 1
>> [node1-128-29:21569] mca:rmaps:rr: assigning proc to object 0
>> [node1-128-29:21569] mca:rmaps:base: computing vpids by slot for job 
>> [25027,1]
>> [node1-128-29:21569] mca:rmaps:base: assigning rank 0 to node node1-128-29
>> [node1-128-29:21569] mca:rmaps: compute bindings for job [25027,1] with 
>> policy CORE
>> [node1-128-29:21569] mca:rmaps: bindings for job [25027,1] - bind in place
>> [node1-128-29:21569] mca:rmaps: bind in place for job [25027,1] with 
>> bindings CORE
>> [node1-128-29:21569] [[25027,0],0] reset_usage: node node1-128-29 has 1 
>> procs on it
>> [node1-128-29:21569] [[25027,0],0] reset_usage: ignoring proc [[25027,1],0]
>> [node1-128-29:21569] BINDING PROC [[25027,1],0] TO Core NUMBER 0
>> [node1-128-29:21569] [[25027,0],0] BOUND PROC [[25027,1],0] TO 0,8[Core:0] 
>> on node node1-128-29
>> [node1-128-29:21571] mca: base: components_register: component sbgp / ibnet 
>> register function failed
>> Main 21.366504 secs total /1
>> Computation 21.048671 secs total /1000
>> [node1-128-29:21569] mca: base: close: unloading component lama
>> [node1-128-29:21569] mca: base: close: component mindist closed
>> [node1-128-29:21569] mca: base: close: unloading component mindist
>> [node1-128-29:21569] mca: base: close: component ppr closed
>> [node1-128-29:21569] mca: base: close: unloading component ppr
>> [node1-128-29:21569] mca: base: close: component rank_file closed
>> [node1-128-29:21569] mca: base: close: unloading component rank_file
>> [node1-128-29:21569] mca: base: close: component resilient closed
>> [node1-128-29:21569] mca: base: close: unloading component resilient
>> [node1-128-29:21569] mca: base: close: component round_robin closed
>> [node1-128-29:21569] mca: base: close: unloading component round_robin
>> [node1-128-29:21569] mca: base: close: component seq closed
>> [node1-128-29:21569] mca: base: close: unloading component seq
>> [node1-128-29:21569] mca: base: close: component staged closed
>> [node1-128-29:21569] mca: base: close: unloading component staged
>> 
>> Regards,
>> Timur.
>> 
>> Thu, 3 Jul 2014 06:10:26 -0700 от Ralph Castain <r...@open-mpi.org>:
>> This looks to me like a message from some older version of OMPI. Please 
>> check your LD_LIBRARY_PATH and ensure that the 1.9 installation is at the 
>> *front* of that list.
>> 
>> Of course, I'm also assuming that you installed the two versions into 
>> different locations - yes?
>> 
>> Also, add "--mca rmaps_base_verbose 20" to your cmd line - this will tell us 
>> what mappers are being considered.
>> 
>> 
>> On Jul 3, 2014, at 1:31 AM, Timur Ismagilov <tismagi...@mail.ru> wrote:
>> 
>>> When i used --map-by slot:pe=8, i got the same message 
>>> 
>>> Your job failed to map. Either no mapper was available, or none
>>> of the available mappers was able to perform the requested
>>> mapping operation. This can happen if you request a map type
>>> (e.g., loadbalance) and the corresponding mapper was not built.
>>> ...
>>> 
>>> 
>>> 
>>> Wed, 2 Jul 2014 07:36:48 -0700 от Ralph Castain <r...@open-mpi.org>:
>>> Let's keep this on the user list so others with similar issues can find it.
>>> 
>>> My guess is that the $OMP_NUM_THREADS syntax isn't quite right, so it 
>>> didn't pick up the actual value there. Since it doesn't hurt to have extra 
>>> cpus, just set it to 8 for your test case and that should be fine, so 
>>> adding a little clarity:
>>> 
>>> --map-by slot:pe=8
>>> 
>>> I'm not aware of any slurm utility similar to top, but there is no reason 
>>> you can't just submit this as an interactive job and use top itself, is 
>>> there?
>>> 
>>> As for that sbgp warning - you can probably just ignore it. Not sure why 
>>> that is failing, but it just means that component will disqualify itself. 
>>> If you want to eliminate it, just add
>>> 
>>> -mca sbgp ^ibnet
>>> 
>>> to your cmd line
>>> 
>>> 
>>> On Jul 2, 2014, at 7:29 AM, Timur Ismagilov <tismagi...@mail.ru> wrote:
>>> 
>>>> Thanks, Ralph!
>>>> 
>>>> With '--map-by :pe=$OMP_NUM_THREADS'  i got:
>>>> 
>>>> --------------------------------------------------------------------------
>>>> Your job failed to map. Either no mapper was available, or none
>>>> of the available mappers was able to perform the requested
>>>> mapping operation. This can happen if you request a map type
>>>> (e.g., loadbalance) and the corresponding mapper was not built.
>>>> 
>>>> What does it mean?
>>>> 
>>>> With '--bind-to socket' everything looks better, but performance still 
>>>> worse..( but better than it was)
>>>> 
>>>> 1 thread 0.028 sec
>>>> 2 thread 0.018 sec
>>>> 4 thread 0.020 sec 
>>>> 8 thread 0.021 sec
>>>> Do i have utility similar to the 'top' with sbatch?
>>>> 
>>>> Also, every time,  i got the message in ompi 1.9:
>>>> mca: base: components_register: component sbgp / ibnet register function 
>>>> failed
>>>> Is it bad?
>>>> 
>>>> Regards, 
>>>> Timur
>>>> 
>>>> Wed, 2 Jul 2014 05:53:44 -0700 от Ralph Castain <r...@open-mpi.org>:
>>>> 
>>>> OMPI started binding by default during the 1.7 series. You should add the 
>>>> following to your cmd line:
>>>> 
>>>> --map-by :pe=$OMP_NUM_THREADS
>>>> 
>>>> This will give you a dedicated core for each thread. Alternatively, you 
>>>> could instead add
>>>> 
>>>> --bind-to socket
>>>> 
>>>> OMPI 1.5.5 doesn't bind at all unless directed to do so, which is why you 
>>>> are getting the difference in behavior.
>>>> 
>>>> 
>>>> On Jul 2, 2014, at 12:33 AM, Timur Ismagilov <tismagi...@mail.ru> wrote:
>>>> 
>>>>> Hello!
>>>>> 
>>>>> I have open mpi 1.9a1r32104 and open mpi 1.5.5.
>>>>> I have much better perfomance in open mpi 1.5.5 with openMP on 8 cores
>>>>> in  the program:
>>>>> 
>>>>> ....
>>>>> 
>>>>> #define N 10000000
>>>>> 
>>>>> 
>>>>> int main(int argc, char *argv[]) {
>>>>> ...............
>>>>> MPI_Init(&argc, &argv);
>>>>> ...............
>>>>> for (i = 0; i < N; i++) {
>>>>> a[i] = i * 1.0;
>>>>> b[i] = i * 2.0;
>>>>> }
>>>>> 
>>>>> #pragma omp parallel for shared(a, b, c) private(i)
>>>>> for (i = 0; i < N; i++) {
>>>>> c[i] = a[i] + b[i];
>>>>> }
>>>>> .............
>>>>> MPI_Finalize();
>>>>> }
>>>>> 
>>>>> I got on 1 node 
>>>>> (for i in 1 2 4 8 ; do export OMP_NUM_THREADS=$i; sbatch -p test -t 5 
>>>>> --exclusive -N 1 -o hybrid-hello_omp$i.out -e hybrid-hello_omp$i.err 
>>>>> ompi_mxm3.0 ./hybrid-hello; done)
>>>>> 
>>>>> 
>>>>> open mpi 1.5.5 (Data for node: node1-128-17 Num slots: 8 Max slots: 0): 
>>>>> 8 threads 0.014527 sec
>>>>> 4 threads 0.016138 sec
>>>>> 2 threads 0.018764 sec
>>>>> 1 thread   0.029963 sec
>>>>> openmpi 1.9a1r32104 (node1-128-29: slots=8 max_slots=0 slots_inuse=0 
>>>>> state=UP):
>>>>> 8 threads 0.035055 sec
>>>>> 4 threads 0.029859 sec 
>>>>> 2 threads 0.019564 sec (same as open mpi 1.5.5)
>>>>> 1 thread   0.028394 sec (same as open mpi 1.5.5)
>>>>> So, it looks like, that open mpi 1.9 use only 2 cores from 8.
>>>>> 
>>>>> What can i do with this?
>>>>> 
>>>>> $cat ompi_mxm3.0
>>>>> 
>>>>> #!/bin/sh
>>>>> 
>>>>> [ x"$TMPDIR" == x"" ] && TMPDIR=/tmp
>>>>> 
>>>>> HOSTFILE=${TMPDIR}/hostfile.${SLURM_JOB_ID}
>>>>> srun hostname -s|sort|uniq -c|awk '{print $2" slots="$1}' > $HOSTFILE || 
>>>>> { rm -f $HOSTFILE; exit 255; }
>>>>> LD_PRELOAD=/mnt/data/users/dm2/vol3/semenov/_scratch/mxm/mxm-3.0/lib/libmxm.so
>>>>>  mpirun -x LD_PRELOAD -x MXM_SHM_KCOPY_MODE=off --display-allocation 
>>>>> --hostfile $HOSTFILE "$@"
>>>>> rc=$?
>>>>> rm -f $HOSTFILE
>>>>> 
>>>>> exit $rc
>>>>> 
>>>>> For open mpi 1.5.5 i remove LD_PRELOAD from run script.
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/users/2014/07/24738.php
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> 
> 
> 
> 
> 

Reply via email to