There is only one path to mpi lib. echo $LD_LIBRARY_PATH /opt/intel/composer_xe_2013.2.146/mkl/lib/intel64:/opt/intel/composer_xe_2013.2.146/compiler/lib/intel64:/home/users/semenov/BFD/lib:/home/users/semenov/local/lib:/usr/lib64/:/mnt/data/users/dm2/vol3/semenov/_scratch/openmpi-1.9.0_mxm-3.0/lib
This one also looks correct. $ldd hybrid-hello linux-vdso.so.1 => (0x00007fff8b983000) libmpi.so.0 => /mnt/data/users/dm2/vol3/semenov/_scratch/openmpi-1.9.0_mxm-3.0/lib/libmpi.so.0 (0x00007f58c95cb000) libm.so.6 => /lib64/libm.so.6 (0x000000338ac00000) libiomp5.so => /opt/intel/composer_xe_2013.2.146/compiler/lib/intel64/libiomp5.so (0x00007f58c92a2000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x000000338d400000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000000338cc00000) libpthread.so.0 => /lib64/libpthread.so.0 (0x000000338b800000) libc.so.6 => /lib64/libc.so.6 (0x000000338b000000) libdl.so.2 => /lib64/libdl.so.2 (0x000000338b400000) libopen-rte.so.0 => /mnt/data/users/dm2/vol3/semenov/_scratch/openmpi-1.9.0_mxm-3.0/lib/libopen-rte.so.0 (0x00007f58c9009000) libopen-pal.so.0 => /mnt/data/users/dm2/vol3/semenov/_scratch/openmpi-1.9.0_mxm-3.0/lib/libopen-pal.so.0 (0x00007f58c8d05000) libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007f58c8afb000) librt.so.1 => /lib64/librt.so.1 (0x000000338c000000) libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003393800000) libutil.so.1 => /lib64/libutil.so.1 (0x000000339b600000) libimf.so => /opt/intel/composer_xe_2013.2.146/compiler/lib/intel64/libimf.so (0x00007f58c863e000) libsvml.so => /opt/intel/composer_xe_2013.2.146/compiler/lib/intel64/libsvml.so (0x00007f58c7c73000) libirng.so => /opt/intel/composer_xe_2013.2.146/compiler/lib/intel64/libirng.so (0x00007f58c7a6b000) libintlc.so.5 => /opt/intel/composer_xe_2013.2.146/compiler/lib/intel64/libintlc.so.5 (0x00007f58c781d000) /lib64/ld-linux-x86-64.so.2 (0x000000338a800000) open mpi 1.5.5 was preinstalled to "/opt/mpi/openmpi-1.5.5-icc/". Here is an output after adding "--mca rmaps_base_verbose 20" and "--map-by slot:pe=8". * outfile: -------------------------------------------------------------------------- Your job failed to map. Either no mapper was available, or none of the available mappers was able to perform the requested mapping operation. This can happen if you request a map type (e.g., loadbalance) and the corresponding mapper was not built. -------------------------------------------------------------------------- * errfile: [node1-128-29:21477] mca: base: components_register: registering rmaps components [node1-128-29:21477] mca: base: components_register: found loaded component lama [node1-128-29:21477] mca:rmaps:lama: Priority 0 [node1-128-29:21477] mca:rmaps:lama: Map : NULL [node1-128-29:21477] mca:rmaps:lama: Bind : NULL [node1-128-29:21477] mca:rmaps:lama: MPPR : NULL [node1-128-29:21477] mca:rmaps:lama: Order : NULL [node1-128-29:21477] mca: base: components_register: component lama register function successful [node1-128-29:21477] mca: base: components_register: found loaded component mindist [node1-128-29:21477] mca: base: components_register: component mindist register function successful [node1-128-29:21477] mca: base: components_register: found loaded component ppr [node1-128-29:21477] mca: base: components_register: component ppr register function successful [node1-128-29:21477] mca: base: components_register: found loaded component rank_file [node1-128-29:21477] mca: base: components_register: component rank_file register function successful [node1-128-29:21477] mca: base: components_register: found loaded component resilient [node1-128-29:21477] mca: base: components_register: component resilient register function successful [node1-128-29:21477] mca: base: components_register: found loaded component round_robin [node1-128-29:21477] mca: base: components_register: component round_robin register function successful [node1-128-29:21477] mca: base: components_register: found loaded component seq [node1-128-29:21477] mca: base: components_register: component seq register function successful [node1-128-29:21477] mca: base: components_register: found loaded component staged [node1-128-29:21477] mca: base: components_register: component staged has no register or open function [node1-128-29:21477] [[26215,0],0] rmaps:base set policy with slot:pe=8 [node1-128-29:21477] [[26215,0],0] rmaps:base policy slot modifiers pe=8 provided [node1-128-29:21477] [[26215,0],0] rmaps:base check modifiers with pe=8 [node1-128-29:21477] [[26215,0],0] rmaps:base setting pe/rank to 8 [node1-128-29:21477] mca: base: components_open: opening rmaps components [node1-128-29:21477] mca: base: components_open: found loaded component lama [node1-128-29:21477] mca: base: components_open: found loaded component mindist [node1-128-29:21477] mca: base: components_open: component mindist open function successful [node1-128-29:21477] mca: base: components_open: found loaded component ppr [node1-128-29:21477] mca: base: components_open: component ppr open function successful [node1-128-29:21477] mca: base: components_open: found loaded component rank_file [node1-128-29:21477] mca: base: components_open: component rank_file open function successful [node1-128-29:21477] mca: base: components_open: found loaded component resilient [node1-128-29:21477] mca: base: components_open: component resilient open function successful [node1-128-29:21477] mca: base: components_open: found loaded component round_robin [node1-128-29:21477] mca: base: components_open: component round_robin open function successful [node1-128-29:21477] mca: base: components_open: found loaded component seq [node1-128-29:21477] mca: base: components_open: component seq open function successful [node1-128-29:21477] mca: base: components_open: found loaded component staged [node1-128-29:21477] mca: base: components_open: component staged open function successful [node1-128-29:21477] mca:rmaps:select: checking available component lama [node1-128-29:21477] mca:rmaps:select: Querying component [lama] [node1-128-29:21477] mca:rmaps:select: checking available component mindist [node1-128-29:21477] mca:rmaps:select: Querying component [mindist] [node1-128-29:21477] mca:rmaps:select: checking available component ppr [node1-128-29:21477] mca:rmaps:select: Querying component [ppr] [node1-128-29:21477] mca:rmaps:select: checking available component rank_file [node1-128-29:21477] mca:rmaps:select: Querying component [rank_file] [node1-128-29:21477] mca:rmaps:select: checking available component resilient [node1-128-29:21477] mca:rmaps:select: Querying component [resilient] [node1-128-29:21477] mca:rmaps:select: checking available component round_robin [node1-128-29:21477] mca:rmaps:select: Querying component [round_robin] [node1-128-29:21477] mca:rmaps:select: checking available component seq [node1-128-29:21477] mca:rmaps:select: Querying component [seq] [node1-128-29:21477] mca:rmaps:select: checking available component staged [node1-128-29:21477] mca:rmaps:select: Querying component [staged] [node1-128-29:21477] [[26215,0],0]: Final mapper priorities [node1-128-29:21477] Mapper: ppr Priority: 90 [node1-128-29:21477] Mapper: seq Priority: 60 [node1-128-29:21477] Mapper: resilient Priority: 40 [node1-128-29:21477] Mapper: mindist Priority: 20 [node1-128-29:21477] Mapper: round_robin Priority: 10 [node1-128-29:21477] Mapper: staged Priority: 5 [node1-128-29:21477] Mapper: lama Priority: 0 [node1-128-29:21477] Mapper: rank_file Priority: 0 [node1-128-29:21477] mca:rmaps: mapping job [26215,1] [node1-128-29:21477] mca:rmaps: creating new map for job [26215,1] [node1-128-29:21477] mca:rmaps: nprocs 0 [node1-128-29:21477] mca:rmaps mapping given - using default [node1-128-29:21477] mca:rmaps:ppr: job [26215,1] not using ppr mapper [node1-128-29:21477] mca:rmaps:seq: job [26215,1] not using seq mapper [node1-128-29:21477] mca:rmaps:resilient: cannot perform initial map of job [26215,1] - no fault groups [node1-128-29:21477] mca:rmaps:mindist: job [26215,1] not using mindist mapper [node1-128-29:21477] mca:rmaps:rr: mapping job [26215,1] [node1-128-29:21477] AVAILABLE NODES FOR MAPPING: [node1-128-29:21477] node: node1-128-29 daemon: 0 [node1-128-29:21477] mca:rmaps:rr: mapping by slot for job [26215,1] slots 1 num_procs 0 [node1-128-29:21477] mca:rmaps:rr:slot working node node1-128-29 [node1-128-29:21477] mca:rmaps:rr:slot assigning 0 procs to node node1-128-29 [node1-128-29:21477] mca:rmaps:base: computing vpids by slot for job [26215,1] [node1-128-29:21477] mca: base: close: unloading component lama [node1-128-29:21477] mca: base: close: component mindist closed [node1-128-29:21477] mca: base: close: unloading component mindist [node1-128-29:21477] mca: base: close: component ppr closed [node1-128-29:21477] mca: base: close: unloading component ppr [node1-128-29:21477] mca: base: close: component rank_file closed [node1-128-29:21477] mca: base: close: unloading component rank_file [node1-128-29:21477] mca: base: close: component resilient closed [node1-128-29:21477] mca: base: close: unloading component resilient [node1-128-29:21477] mca: base: close: component round_robin closed [node1-128-29:21477] mca: base: close: unloading component round_robin [node1-128-29:21477] mca: base: close: component seq closed [node1-128-29:21477] mca: base: close: unloading component seq [node1-128-29:21477] mca: base: close: component staged closed [node1-128-29:21477] mca: base: close: unloading component staged Here is an output after adding "--mca rmaps_base_verbose 20" and WITHOUT "--map-by slot:pe=8". * outfile: nothing * errfile: [node1-128-29:21569] mca: base: components_register: registering rmaps components [node1-128-29:21569] mca: base: components_register: found loaded component lama [node1-128-29:21569] mca:rmaps:lama: Priority 0 [node1-128-29:21569] mca:rmaps:lama: Map : NULL [node1-128-29:21569] mca:rmaps:lama: Bind : NULL [node1-128-29:21569] mca:rmaps:lama: MPPR : NULL [node1-128-29:21569] mca:rmaps:lama: Order : NULL [node1-128-29:21569] mca: base: components_register: component lama register function successful [node1-128-29:21569] mca: base: components_register: found loaded component mindist [node1-128-29:21569] mca: base: components_register: component mindist register function successful [node1-128-29:21569] mca: base: components_register: found loaded component ppr [node1-128-29:21569] mca: base: components_register: component ppr register function successful [node1-128-29:21569] mca: base: components_register: found loaded component rank_file [node1-128-29:21569] mca: base: components_register: component rank_file register function successful [node1-128-29:21569] mca: base: components_register: found loaded component resilient [node1-128-29:21569] mca: base: components_register: component resilient register function successful [node1-128-29:21569] mca: base: components_register: found loaded component round_robin [node1-128-29:21569] mca: base: components_register: component round_robin register function successful [node1-128-29:21569] mca: base: components_register: found loaded component seq [node1-128-29:21569] mca: base: components_register: component seq register function successful [node1-128-29:21569] mca: base: components_register: found loaded component staged [node1-128-29:21569] mca: base: components_register: component staged has no register or open function [node1-128-29:21569] [[25027,0],0] rmaps:base set policy with NULL [node1-128-29:21569] mca: base: components_open: opening rmaps components [node1-128-29:21569] mca: base: components_open: found loaded component lama [node1-128-29:21569] mca: base: components_open: found loaded component mindist [node1-128-29:21569] mca: base: components_open: component mindist open function successful [node1-128-29:21569] mca: base: components_open: found loaded component ppr [node1-128-29:21569] mca: base: components_open: component ppr open function successful [node1-128-29:21569] mca: base: components_open: found loaded component rank_file [node1-128-29:21569] mca: base: components_open: component rank_file open function successful [node1-128-29:21569] mca: base: components_open: found loaded component resilient [node1-128-29:21569] mca: base: components_open: component resilient open function successful [node1-128-29:21569] mca: base: components_open: found loaded component round_robin [node1-128-29:21569] mca: base: components_open: component round_robin open function successful [node1-128-29:21569] mca: base: components_open: found loaded component seq [node1-128-29:21569] mca: base: components_open: component seq open function successful [node1-128-29:21569] mca: base: components_open: found loaded component staged [node1-128-29:21569] mca: base: components_open: component staged open function successful [node1-128-29:21569] mca:rmaps:select: checking available component lama [node1-128-29:21569] mca:rmaps:select: Querying component [lama] [node1-128-29:21569] mca:rmaps:select: checking available component mindist [node1-128-29:21569] mca:rmaps:select: Querying component [mindist] [node1-128-29:21569] mca:rmaps:select: checking available component ppr [node1-128-29:21569] mca:rmaps:select: Querying component [ppr] [node1-128-29:21569] mca:rmaps:select: checking available component rank_file [node1-128-29:21569] mca:rmaps:select: Querying component [rank_file] [node1-128-29:21569] mca:rmaps:select: checking available component resilient [node1-128-29:21569] mca:rmaps:select: Querying component [resilient] [node1-128-29:21569] mca:rmaps:select: checking available component round_robin [node1-128-29:21569] mca:rmaps:select: Querying component [round_robin] [node1-128-29:21569] mca:rmaps:select: checking available component seq [node1-128-29:21569] mca:rmaps:select: Querying component [seq] [node1-128-29:21569] mca:rmaps:select: checking available component staged [node1-128-29:21569] mca:rmaps:select: Querying component [staged] [node1-128-29:21569] [[25027,0],0]: Final mapper priorities [node1-128-29:21569] Mapper: ppr Priority: 90 [node1-128-29:21569] Mapper: seq Priority: 60 [node1-128-29:21569] Mapper: resilient Priority: 40 [node1-128-29:21569] Mapper: mindist Priority: 20 [node1-128-29:21569] Mapper: round_robin Priority: 10 [node1-128-29:21569] Mapper: staged Priority: 5 [node1-128-29:21569] Mapper: lama Priority: 0 [node1-128-29:21569] Mapper: rank_file Priority: 0 [node1-128-29:21569] mca:rmaps: mapping job [25027,1] [node1-128-29:21569] mca:rmaps: creating new map for job [25027,1] [node1-128-29:21569] mca:rmaps: nprocs 0 [node1-128-29:21569] mca:rmaps mapping not given - using bycore [node1-128-29:21569] mca:rmaps:ppr: job [25027,1] not using ppr mapper [node1-128-29:21569] mca:rmaps:seq: job [25027,1] not using seq mapper [node1-128-29:21569] mca:rmaps:resilient: cannot perform initial map of job [25027,1] - no fault groups [node1-128-29:21569] mca:rmaps:mindist: job [25027,1] not using mindist mapper [node1-128-29:21569] mca:rmaps:rr: mapping job [25027,1] [node1-128-29:21569] AVAILABLE NODES FOR MAPPING: [node1-128-29:21569] node: node1-128-29 daemon: 0 [node1-128-29:21569] mca:rmaps:rr: mapping no-span by Core for job [25027,1] slots 1 num_procs 1 [node1-128-29:21569] mca:rmaps:rr: found 8 Core objects on node node1-128-29 [node1-128-29:21569] mca:rmaps:rr: calculated nprocs 1 [node1-128-29:21569] mca:rmaps:rr: assigning nprocs 1 [node1-128-29:21569] mca:rmaps:rr: assigning proc to object 0 [node1-128-29:21569] mca:rmaps:base: computing vpids by slot for job [25027,1] [node1-128-29:21569] mca:rmaps:base: assigning rank 0 to node node1-128-29 [node1-128-29:21569] mca:rmaps: compute bindings for job [25027,1] with policy CORE [node1-128-29:21569] mca:rmaps: bindings for job [25027,1] - bind in place [node1-128-29:21569] mca:rmaps: bind in place for job [25027,1] with bindings CORE [node1-128-29:21569] [[25027,0],0] reset_usage: node node1-128-29 has 1 procs on it [node1-128-29:21569] [[25027,0],0] reset_usage: ignoring proc [[25027,1],0] [node1-128-29:21569] BINDING PROC [[25027,1],0] TO Core NUMBER 0 [node1-128-29:21569] [[25027,0],0] BOUND PROC [[25027,1],0] TO 0,8[Core:0] on node node1-128-29 [node1-128-29:21571] mca: base: components_register: component sbgp / ibnet register function failed Main 21.366504 secs total /1 Computation 21.048671 secs total /1000 [node1-128-29:21569] mca: base: close: unloading component lama [node1-128-29:21569] mca: base: close: component mindist closed [node1-128-29:21569] mca: base: close: unloading component mindist [node1-128-29:21569] mca: base: close: component ppr closed [node1-128-29:21569] mca: base: close: unloading component ppr [node1-128-29:21569] mca: base: close: component rank_file closed [node1-128-29:21569] mca: base: close: unloading component rank_file [node1-128-29:21569] mca: base: close: component resilient closed [node1-128-29:21569] mca: base: close: unloading component resilient [node1-128-29:21569] mca: base: close: component round_robin closed [node1-128-29:21569] mca: base: close: unloading component round_robin [node1-128-29:21569] mca: base: close: component seq closed [node1-128-29:21569] mca: base: close: unloading component seq [node1-128-29:21569] mca: base: close: component staged closed [node1-128-29:21569] mca: base: close: unloading component staged Regards, Timur. Thu, 3 Jul 2014 06:10:26 -0700 от Ralph Castain <r...@open-mpi.org>: >This looks to me like a message from some older version of OMPI. Please check >your LD_LIBRARY_PATH and ensure that the 1.9 installation is at the *front* of >that list. > >Of course, I'm also assuming that you installed the two versions into >different locations - yes? > >Also, add "--mca rmaps_base_verbose 20" to your cmd line - this will tell us >what mappers are being considered. > > >On Jul 3, 2014, at 1:31 AM, Timur Ismagilov < tismagi...@mail.ru > wrote: >>When i used --map-by slot:pe=8, i got the same message >> >>Your job failed to map. Either no mapper was available, or none >>of the available mappers was able to perform the requested >>mapping operation. This can happen if you request a map type >>(e.g., loadbalance) and the corresponding mapper was not built. >>... >> >>Wed, 2 Jul 2014 07:36:48 -0700 от Ralph Castain < r...@open-mpi.org >: >>>Let's keep this on the user list so others with similar issues can find it. >>> >>>My guess is that the $OMP_NUM_THREADS syntax isn't quite right, so it didn't >>>pick up the actual value there. Since it doesn't hurt to have extra cpus, >>>just set it to 8 for your test case and that should be fine, so adding a >>>little clarity: >>> >>>--map-by slot:pe=8 >>> >>>I'm not aware of any slurm utility similar to top, but there is no reason >>>you can't just submit this as an interactive job and use top itself, is >>>there? >>> >>>As for that sbgp warning - you can probably just ignore it. Not sure why >>>that is failing, but it just means that component will disqualify itself. If >>>you want to eliminate it, just add >>> >>>-mca sbgp ^ibnet >>> >>>to your cmd line >>> >>> >>>On Jul 2, 2014, at 7:29 AM, Timur Ismagilov < tismagi...@mail.ru > wrote: >>>>Thanks, Ralph! >>>>With '--map-by :pe=$OMP_NUM_THREADS' i got: >>>>-------------------------------------------------------------------------- >>>>Your job failed to map. Either no mapper was available, or none >>>>of the available mappers was able to perform the requested >>>>mapping operation. This can happen if you request a map type >>>>(e.g., loadbalance) and the corresponding mapper was not built. >>>> >>>>What does it mean? >>>>With '--bind-to socket' everything looks better, but performance still >>>>worse..( but better than it was) >>>>* 1 thread 0.028 sec >>>>* 2 thread 0.018 sec >>>>* 4 thread 0.020 sec >>>>* 8 thread 0.021 sec >>>>Do i have utility similar to the 'top' with sbatch? >>>> >>>>Also, every time, i got the message in ompi 1.9: >>>>mca: base: components_register: component sbgp / ibnet register function >>>>failed >>>>Is it bad? >>>> >>>>Regards, >>>>Timur >>>> >>>>Wed, 2 Jul 2014 05:53:44 -0700 от Ralph Castain < r...@open-mpi.org >: >>>>>OMPI started binding by default during the 1.7 series. You should add the >>>>>following to your cmd line: >>>>> >>>>>--map-by :pe=$OMP_NUM_THREADS >>>>> >>>>>This will give you a dedicated core for each thread. Alternatively, you >>>>>could instead add >>>>> >>>>>--bind-to socket >>>>> >>>>>OMPI 1.5.5 doesn't bind at all unless directed to do so, which is why you >>>>>are getting the difference in behavior. >>>>> >>>>> >>>>>On Jul 2, 2014, at 12:33 AM, Timur Ismagilov < tismagi...@mail.ru > wrote: >>>>>>Hello! >>>>>>I have open mpi 1.9a1r32104 and open mpi 1.5.5. >>>>>>I have much better perfomance in open mpi 1.5.5 with openMP on 8 cores >>>>>>in the program: >>>>>>.... >>>>>> >>>>>>#define N 10000000 >>>>>> >>>>>>int main(int argc, char *argv[]) { >>>>>>............... >>>>>>MPI_Init(&argc, &argv); >>>>>>............... >>>>>>for (i = 0; i < N; i++) { >>>>>>a[i] = i * 1.0; >>>>>>b[i] = i * 2.0; >>>>>>} >>>>>> >>>>>>#pragma omp parallel for shared(a, b, c) private(i) >>>>>>for (i = 0; i < N; i++) { >>>>>>c[i] = a[i] + b[i]; >>>>>>} >>>>>>............. >>>>>>MPI_Finalize(); >>>>>>} >>>>>>I got on 1 node >>>>>>(for i in 1 2 4 8 ; do export OMP_NUM_THREADS=$i; sbatch -p test -t 5 >>>>>>--exclusive -N 1 -o hybrid-hello_omp$i.out -e hybrid-hello_omp$i.err >>>>>>ompi_mxm3.0 ./hybrid-hello; done) >>>>>> >>>>>>* open mpi 1.5.5 (Data for node: node1-128-17 Num slots: 8 Max slots: >>>>>>0): >>>>>>* 8 threads 0.014527 sec >>>>>>* 4 threads 0.016138 sec >>>>>>* 2 threads 0.018764 sec >>>>>>* 1 thread 0.029963 sec >>>>>>* openmpi 1.9a1r32104 ( node1-128-29: slots=8 max_slots=0 slots_inuse=0 >>>>>>state=UP ): >>>>>>* 8 threads 0.035055 sec >>>>>>* 4 threads 0.029859 sec >>>>>>* 2 threads 0.019564 sec (same as open mpi 1.5.5 ) >>>>>>* 1 thread 0.028394 sec (same as open mpi 1.5.5 ) >>>>>>So, it looks like, that open mpi 1.9 use only 2 cores from 8. >>>>>> >>>>>>What can i do with this? >>>>>> >>>>>>$cat ompi_mxm3.0 >>>>>>#!/bin/sh >>>>>>[ x"$TMPDIR" == x"" ] && TMPDIR=/tmp >>>>>>HOSTFILE=${TMPDIR}/hostfile.${SLURM_JOB_ID} >>>>>>srun hostname -s|sort|uniq -c|awk '{print $2" slots="$1}' > $HOSTFILE || >>>>>>{ rm -f $HOSTFILE; exit 255; } >>>>>>LD_PRELOAD=/mnt/data/users/dm2/vol3/semenov/_scratch/mxm/mxm-3.0/lib/libmxm.so >>>>>> mpirun -x LD_PRELOAD -x MXM_SHM_KCOPY_MODE=off --display-allocation >>>>>>--hostfile $HOSTFILE "$@" >>>>>>rc=$? >>>>>>rm -f $HOSTFILE >>>>>>exit $rc >>>>>> >>>>>>For open mpi 1.5.5 i remove LD_PRELOAD from run script. >>>>>>_______________________________________________ >>>>>>users mailing list >>>>>>us...@open-mpi.org >>>>>>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>Link to this post: >>>>>>http://www.open-mpi.org/community/lists/users/2014/07/24738.php >>>> >>>> >>>> >> >> >>