I changed my downloaded MOFED version to match the one installed on the
node and now the error goes away and it runs fine. But I still have a
question, I get the exact same performance on all the below 3 cases:

1) mpirun --allow-run-as-root  --mca mtl mxm -mca mtl_mxm_np 0 -x
MXM_TLS=self,shm,rc,ud -n 1 /root/backend  localhost : -x
LD_PRELOAD=/root/libci.so -n 1 /root/app2

2) mpirun --allow-run-as-root  --mca mtl mxm -n 1 /root/backend  localhost
: -x LD_PRELOAD=/root/libci.so -n 1 /root/app2

3) mpirun --allow-run-as-root  --mca mtl ^mxm -n 1 /root/backend  localhost
: -x LD_PRELOAD=/root/libci.so -n 1 /root/app2

Seems like it doesn't matter if I use mxm, not use mxm or use it with
reliable connection (RC). How can I be sure I am indeed using mxm over
infiniband?

Thanks,
Subhra.





On Thu, Apr 23, 2015 at 1:06 AM, Mike Dubman <mi...@dev.mellanox.co.il>
wrote:

> /usr/bin/ofed_info
>
> So, the OFED on your system is not MellanoxOFED 2.4.x but smth else.
>
> try #rpm -qi libibverbs
>
>
> On Thu, Apr 23, 2015 at 7:47 AM, Subhra Mazumdar <
> subhramazumd...@gmail.com> wrote:
>
>> Hi,
>>
>> where is the command ofed_info located? I searched from / but didn't find
>> it.
>>
>> Subhra.
>>
>> On Tue, Apr 21, 2015 at 10:43 PM, Mike Dubman <mi...@dev.mellanox.co.il>
>> wrote:
>>
>>> cool, progress!
>>>
>>> >>1429676565.124664]         sys.c:719  MXM  WARN  Conflicting CPU
>>> frequencies detected, using: 2601.00
>>>
>>> means that cpu governor on your machine is not on "performance" mode
>>>
>>> >> MXM  ERROR ibv_query_device() returned 38: Function not implemented
>>>
>>> indicates that ofed installed on your nodes is not indeed 2.4.-1.0.0 or
>>> there is a mismatch between ofed kernel drivers version and ofed userspace
>>> libraries version.
>>> or you have multiple ofed libraries installed on your node and use
>>> incorrect one.
>>> could you please check that ofed_info -s indeed prints mofed 2.4-1.0.0?
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Apr 22, 2015 at 7:59 AM, Subhra Mazumdar <
>>> subhramazumd...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I compiled the openmpi that comes inside the mellanox hpcx package with
>>>> mxm support instead of separately downloaded openmpi. I also used the
>>>> environment as in the README so that no LD_PRELOAD (except our own library
>>>> which is unrelated) is needed. Now it runs fine (no segfault) but we get
>>>> same errors as before (saying initialization of MXM library failed). Is it
>>>> using MXM successfully?
>>>>
>>>> [root@JARVICE
>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5]# mpirun
>>>> --allow-run-as-root  --mca mtl mxm -n 1 /root/backend  localhost : -x
>>>> LD_PRELOAD=/root/libci.so -n 1 /root/app2
>>>>
>>>> --------------------------------------------------------------------------
>>>> WARNING: a request was made to bind a process. While the system
>>>> supports binding the process itself, at least one node does NOT
>>>> support binding memory to the process location.
>>>>
>>>>   Node:  JARVICE
>>>>
>>>> This usually is due to not having the required NUMA support installed
>>>> on the node. In some Linux distributions, the required support is
>>>> contained in the libnumactl and libnumactl-devel packages.
>>>> This is a warning only; your job will continue, though performance may
>>>> be degraded.
>>>>
>>>> --------------------------------------------------------------------------
>>>>  i am backend
>>>> [1429676565.121218]         sys.c:719  MXM  WARN  Conflicting CPU
>>>> frequencies detected, using: 2601.00
>>>> [1429676565.122937] [JARVICE:14767:0]      ib_dev.c:445  MXM  WARN
>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>> [1429676565.122950] [JARVICE:14767:0]      ib_dev.c:456  MXM  ERROR
>>>> ibv_query_device() returned 38: Function not implemented
>>>> [1429676565.123535] [JARVICE:14767:0]      ib_dev.c:445  MXM  WARN
>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>> [1429676565.123543] [JARVICE:14767:0]      ib_dev.c:456  MXM  ERROR
>>>> ibv_query_device() returned 38: Function not implemented
>>>> [1429676565.124664]         sys.c:719  MXM  WARN  Conflicting CPU
>>>> frequencies detected, using: 2601.00
>>>> [1429676565.126264] [JARVICE:14768:0]      ib_dev.c:445  MXM  WARN
>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>> [1429676565.126276] [JARVICE:14768:0]      ib_dev.c:456  MXM  ERROR
>>>> ibv_query_device() returned 38: Function not implemented
>>>> [1429676565.126812] [JARVICE:14768:0]      ib_dev.c:445  MXM  WARN
>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>> [1429676565.126821] [JARVICE:14768:0]      ib_dev.c:456  MXM  ERROR
>>>> ibv_query_device() returned 38: Function not implemented
>>>>
>>>> --------------------------------------------------------------------------
>>>> Initialization of MXM library failed.
>>>>
>>>>   Error: Input/output error
>>>>
>>>>
>>>> --------------------------------------------------------------------------
>>>>
>>>> <application runs fine>
>>>>
>>>>
>>>> Thanks,
>>>> Subhra.
>>>>
>>>>
>>>> On Sat, Apr 18, 2015 at 12:28 AM, Mike Dubman <mi...@dev.mellanox.co.il
>>>> > wrote:
>>>>
>>>>> could you please check that ofed_info -s indeed prints mofed 2.4-1.0.0?
>>>>> why LD_PRELOAD needed in your command line? Can you try
>>>>>
>>>>> module load hpcx
>>>>> mpirun -np $np test.exe
>>>>> ?
>>>>>
>>>>> On Sat, Apr 18, 2015 at 8:39 AM, Subhra Mazumdar <
>>>>> subhramazumd...@gmail.com> wrote:
>>>>>
>>>>>> I followed the instructions as in the README, now getting a different
>>>>>> error:
>>>>>>
>>>>>> [root@JARVICE
>>>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5]#
>>>>>> ../openmpi-1.8.4/openmpinstall/bin/mpirun --allow-run-as-root --mca mtl 
>>>>>> mxm
>>>>>> -x LD_PRELOAD="../openmpi-1.8.4/openmpinstall/lib/libmpi.so.1
>>>>>> ./mxm/lib/libmxm.so.2" -n 1 ../backend localhost : -x
>>>>>> LD_PRELOAD="../openmpi-1.8.4/openmpinstall/lib/libmpi.so.1
>>>>>> ./mxm/lib/libmxm.so.2 ../libci.so" -n 1 ../app2
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>>
>>>>>> WARNING: a request was made to bind a process. While the system
>>>>>>
>>>>>> supports binding the process itself, at least one node does NOT
>>>>>>
>>>>>> support binding memory to the process location.
>>>>>>
>>>>>>  Node:  JARVICE
>>>>>>
>>>>>> This usually is due to not having the required NUMA support installed
>>>>>>
>>>>>> on the node. In some Linux distributions, the required support is
>>>>>>
>>>>>> contained in the libnumactl and libnumactl-devel packages.
>>>>>>
>>>>>> This is a warning only; your job will continue, though performance
>>>>>> may be degraded.
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>>
>>>>>> i am backend
>>>>>>
>>>>>> [1429334876.139452] [JARVICE:449  :0]   ib_dev.c:445  MXM  WARN
>>>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>>>>
>>>>>> [1429334876.139464] [JARVICE:449  :0]   ib_dev.c:456  MXM  ERROR
>>>>>> ibv_query_device() returned 38: Function not implemented
>>>>>>
>>>>>> [1429334876.139982] [JARVICE:449  :0]   ib_dev.c:445  MXM  WARN
>>>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>>>>
>>>>>> [1429334876.139990] [JARVICE:449  :0]   ib_dev.c:456  MXM  ERROR
>>>>>> ibv_query_device() returned 38: Function not implemented
>>>>>>
>>>>>> [1429334876.142649] [JARVICE:450  :0]   ib_dev.c:445  MXM  WARN
>>>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>>>>
>>>>>> [1429334876.142666] [JARVICE:450  :0]   ib_dev.c:456  MXM  ERROR
>>>>>> ibv_query_device() returned 38: Function not implemented
>>>>>>
>>>>>> [1429334876.143235] [JARVICE:450  :0]   ib_dev.c:445  MXM  WARN
>>>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>>>>
>>>>>> [1429334876.143243] [JARVICE:450  :0]   ib_dev.c:456  MXM  ERROR
>>>>>> ibv_query_device() returned 38: Function not implemented
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>>
>>>>>> Initialization of MXM library failed.
>>>>>>
>>>>>>  Error: Input/output error
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>>
>>>>>> [JARVICE:449  :0] Caught signal 11 (Segmentation fault)
>>>>>>
>>>>>> [JARVICE:450  :0] Caught signal 11 (Segmentation fault)
>>>>>>
>>>>>> ==== backtrace ====
>>>>>>
>>>>>> 2 0x000000000005640c mxm_handle_error()
>>>>>>  
>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:641
>>>>>>
>>>>>> 3 0x000000000005657c mxm_error_signal_handler()
>>>>>>  
>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:616
>>>>>>
>>>>>> 4 0x00000000000329a0 killpg()  ??:0
>>>>>>
>>>>>> 5 0x000000000004812c _IO_vfprintf()  ??:0
>>>>>>
>>>>>> 6 0x000000000006f6da vasprintf()  ??:0
>>>>>>
>>>>>> 7 0x0000000000059b3b opal_show_help_vstring()  ??:0
>>>>>>
>>>>>> 8 0x0000000000026630 orte_show_help()  ??:0
>>>>>>
>>>>>> 9 0x0000000000001a3f mca_bml_r2_add_procs()
>>>>>>  
>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/bml/r2/bml_r2.c:409
>>>>>>
>>>>>> 10 0x0000000000004475 mca_pml_ob1_add_procs()
>>>>>>  
>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/pml/ob1/pml_ob1.c:332
>>>>>>
>>>>>> 11 0x00000000000442f3 ompi_mpi_init()  ??:0
>>>>>>
>>>>>> 12 0x0000000000067cb0 PMPI_Init_thread()  ??:0
>>>>>>
>>>>>> 13 0x000000000000d0ca l_getLocalFromConfig()
>>>>>>  /root/rain_ib/interposer/libciutils.c:83
>>>>>>
>>>>>> 14 0x000000000000c7b4 __cudaRegisterFatBinary()
>>>>>>  /root/rain_ib/interposer/libci.c:4055
>>>>>>
>>>>>> 15 0x0000000000402b59
>>>>>> _ZL70__sti____cudaRegisterAll_39_tmpxft_00000703_00000000_6_app2_cpp1_ii_hwv()
>>>>>>  tmpxft_00000703_00000000-3_app2.cudafe1.cpp:0
>>>>>>
>>>>>> 16 0x0000000000402dd6 __do_global_ctors_aux()  crtstuff.c:0
>>>>>>
>>>>>> ===================
>>>>>>
>>>>>> ==== backtrace ====
>>>>>>
>>>>>> 2 0x000000000005640c mxm_handle_error()
>>>>>>  
>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:641
>>>>>>
>>>>>> 3 0x000000000005657c mxm_error_signal_handler()
>>>>>>  
>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:616
>>>>>>
>>>>>> 4 0x00000000000329a0 killpg()  ??:0
>>>>>>
>>>>>> 5 0x000000000004812c _IO_vfprintf()  ??:0
>>>>>>
>>>>>> 6 0x000000000006f6da vasprintf()  ??:0
>>>>>>
>>>>>> 7 0x0000000000059b3b opal_show_help_vstring()  ??:0
>>>>>>
>>>>>> 8 0x0000000000026630 orte_show_help()  ??:0
>>>>>>
>>>>>> 9 0x0000000000001a3f mca_bml_r2_add_procs()
>>>>>>  
>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/bml/r2/bml_r2.c:409
>>>>>>
>>>>>> 10 0x0000000000004475 mca_pml_ob1_add_procs()
>>>>>>  
>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/pml/ob1/pml_ob1.c:332
>>>>>>
>>>>>> 11 0x00000000000442f3 ompi_mpi_init()  ??:0
>>>>>>
>>>>>> 12 0x0000000000067cb0 PMPI_Init_thread()  ??:0
>>>>>>
>>>>>> 13 0x0000000000404fdf main()  /root/rain_ib/backend/backend.c:1237
>>>>>>
>>>>>> 14 0x000000000001ed1d __libc_start_main()  ??:0
>>>>>>
>>>>>> 15 0x0000000000402db9 _start()  ??:0
>>>>>>
>>>>>> ===================
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>>
>>>>>> mpirun noticed that process rank 1 with PID 450 on node JARVICE
>>>>>> exited on signal 11 (Segmentation fault).
>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>>
>>>>>> [JARVICE:00447] 1 more process has sent help message help-mtl-mxm.txt
>>>>>> / mxm init
>>>>>>
>>>>>> [JARVICE:00447] Set MCA parameter "orte_base_help_aggregate" to 0 to
>>>>>> see all help / error messages
>>>>>>
>>>>>> [root@JARVICE
>>>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5]#
>>>>>>
>>>>>>
>>>>>> Subhra.
>>>>>>
>>>>>>
>>>>>> On Mon, Apr 13, 2015 at 10:58 PM, Mike Dubman <
>>>>>> mi...@dev.mellanox.co.il> wrote:
>>>>>>
>>>>>>> Have you followed installation steps from README (Also here for
>>>>>>> reference http://bgate.mellanox.com/products/hpcx/README.txt)
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>> * Load OpenMPI/OpenSHMEM v1.8 based package:
>>>>>>>
>>>>>>>     % source $HPCX_HOME/hpcx-init.sh
>>>>>>>     % hpcx_load
>>>>>>>     % env | grep HPCX
>>>>>>>     % mpirun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_usempi
>>>>>>>     % oshrun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_oshmem
>>>>>>>     % hpcx_unload
>>>>>>>
>>>>>>> 3. Load HPCX environment from modules
>>>>>>>
>>>>>>> * Load OpenMPI/OpenSHMEM based package:
>>>>>>>
>>>>>>>     % module use $HPCX_HOME/modulefiles
>>>>>>>     % module load hpcx
>>>>>>>     % mpirun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_c
>>>>>>>     % oshrun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_oshmem
>>>>>>>     % module unload hpcx
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>> On Tue, Apr 14, 2015 at 5:42 AM, Subhra Mazumdar <
>>>>>>> subhramazumd...@gmail.com> wrote:
>>>>>>>
>>>>>>>> I am using 2.4-1.0.0 mellanox ofed.
>>>>>>>>
>>>>>>>> I downloaded mofed tarball
>>>>>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5.tar and 
>>>>>>>> extracted
>>>>>>>> it. It has mxm directory.
>>>>>>>>
>>>>>>>> hpcx-v1.2.0-325-[root@JARVICE ~]# ls
>>>>>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5
>>>>>>>> archive      fca    hpcx-init-ompi-mellanox-v1.8.sh  ibprof
>>>>>>>> modulefiles  ompi-mellanox-v1.8  sources  VERSION
>>>>>>>> bupc-master  hcoll  hpcx-init.sh                     knem
>>>>>>>> mxm          README.txt          utils
>>>>>>>>
>>>>>>>> I tried using LD_PRELOAD for libmxm, but getting a different error
>>>>>>>> stack now as following
>>>>>>>>
>>>>>>>> [root@JARVICE ~]# ./openmpi-1.8.4/openmpinstall/bin/mpirun
>>>>>>>> --allow-run-as-root --mca mtl mxm -x
>>>>>>>> LD_PRELOAD="./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1
>>>>>>>> ./hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm/lib/libmxm.so.2"
>>>>>>>> -n 1 ./backend  localhost : -x
>>>>>>>> LD_PRELOAD="./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1
>>>>>>>> ./hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm/lib/libmxm.so.2
>>>>>>>> ./libci.so" -n 1 ./app2
>>>>>>>>  i am backend
>>>>>>>> [JARVICE:00564] mca: base: components_open: component pml / cm open
>>>>>>>> function failed
>>>>>>>> [JARVICE:564  :0] Caught signal 11 (Segmentation fault)
>>>>>>>> [JARVICE:00565] mca: base: components_open: component pml / cm open
>>>>>>>> function failed
>>>>>>>> [JARVICE:565  :0] Caught signal 11 (Segmentation fault)
>>>>>>>> ==== backtrace ====
>>>>>>>>  2 0x000000000005640c mxm_handle_error()
>>>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:641
>>>>>>>>  3 0x000000000005657c mxm_error_signal_handler()
>>>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:616
>>>>>>>>  4 0x00000000000329a0 killpg()  ??:0
>>>>>>>>  5 0x0000000000045491 mca_base_components_close()  ??:0
>>>>>>>>  6 0x000000000004e99a mca_base_framework_close()  ??:0
>>>>>>>>  7 0x0000000000045431 mca_base_component_close()  ??:0
>>>>>>>>  8 0x000000000004515c mca_base_framework_components_open()  ??:0
>>>>>>>>  9 0x00000000000a0de9 mca_pml_base_open()  pml_base_frame.c:0
>>>>>>>> 10 0x000000000004eb1c mca_base_framework_open()  ??:0
>>>>>>>> 11 0x0000000000043eb3 ompi_mpi_init()  ??:0
>>>>>>>> 12 0x0000000000067cb0 PMPI_Init_thread()  ??:0
>>>>>>>> 13 0x0000000000404fdf main()  /root/rain_ib/backend/backend.c:1237
>>>>>>>> 14 0x000000000001ed1d __libc_start_main()  ??:0
>>>>>>>> 15 0x0000000000402db9 _start()  ??:0
>>>>>>>> ===================
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> A requested component was not found, or was unable to be opened.
>>>>>>>> This
>>>>>>>> means that this component is either not installed or is unable to be
>>>>>>>> used on your system (e.g., sometimes this means that shared
>>>>>>>> libraries
>>>>>>>> that the component requires are unable to be found/loaded).  Note
>>>>>>>> that
>>>>>>>> Open MPI stopped checking at the first component that it did not
>>>>>>>> find.
>>>>>>>>
>>>>>>>> Host:      JARVICE
>>>>>>>> Framework: mtl
>>>>>>>> Component: mxm
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> mpirun noticed that process rank 0 with PID 564 on node JARVICE
>>>>>>>> exited on signal 11 (Segmentation fault).
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> [JARVICE:00562] 1 more process has sent help message
>>>>>>>> help-mca-base.txt / find-available:not-valid
>>>>>>>> [JARVICE:00562] Set MCA parameter "orte_base_help_aggregate" to 0
>>>>>>>> to see all help / error messages
>>>>>>>>
>>>>>>>>
>>>>>>>> Subhra
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Apr 12, 2015 at 10:48 PM, Mike Dubman <
>>>>>>>> mi...@dev.mellanox.co.il> wrote:
>>>>>>>>
>>>>>>>>> seems like mxm was not found in your ld_library_path.
>>>>>>>>>
>>>>>>>>> what mofed version do you use?
>>>>>>>>> does it have /opt/mellanox/mxm in it?
>>>>>>>>> You could just run mpirun from HPCX package which looks for mxm
>>>>>>>>> internally and recompile ompi as mentioned in README.
>>>>>>>>>
>>>>>>>>> On Mon, Apr 13, 2015 at 3:24 AM, Subhra Mazumdar <
>>>>>>>>> subhramazumd...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I used mxm mtl as follows but getting segfault. It says mxm
>>>>>>>>>> component not found but I have compiled openmpi with mxm. Any idea 
>>>>>>>>>> what I
>>>>>>>>>> might be missing?
>>>>>>>>>>
>>>>>>>>>> [root@JARVICE ~]# ./openmpi-1.8.4/openmpinstall/bin/mpirun
>>>>>>>>>> --allow-run-as-root --mca pml cm --mca mtl mxm -n 1 -x
>>>>>>>>>> LD_PRELOAD=./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1 ./backend
>>>>>>>>>> localhosst : -n 1 -x LD_PRELOAD="./libci.so
>>>>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1" ./app2
>>>>>>>>>>  i am backend
>>>>>>>>>> [JARVICE:08398] *** Process received signal ***
>>>>>>>>>> [JARVICE:08398] Signal: Segmentation fault (11)
>>>>>>>>>> [JARVICE:08398] Signal code: Address not mapped (1)
>>>>>>>>>> [JARVICE:08398] Failing at address: 0x10
>>>>>>>>>> [JARVICE:08398] [ 0]
>>>>>>>>>> /lib64/libpthread.so.0(+0xf710)[0x7ff8d0ddb710]
>>>>>>>>>> [JARVICE:08398] [ 1]
>>>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_components_close+0x21)[0x7ff8cf9ae491]
>>>>>>>>>> [JARVICE:08398] [ 2]
>>>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_framework_close+0x6a)[0x7ff8cf9b799a]
>>>>>>>>>> [JARVICE:08398] [ 3]
>>>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_component_close+0x21)[0x7ff8cf9ae431]
>>>>>>>>>> [JARVICE:08398] [ 4]
>>>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_framework_components_open+0x11c)[0x7ff8cf9ae15c]
>>>>>>>>>> [JARVICE:08398] [ 5]
>>>>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1(+0xa0de9)[0x7ff8d1089de9]
>>>>>>>>>> [JARVICE:08398] [ 6]
>>>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7ff8cf9b7b1c]
>>>>>>>>>> [JARVICE:08398] [ 7] [JARVICE:08398] mca: base: components_open:
>>>>>>>>>> component pml / cm open function failed
>>>>>>>>>>
>>>>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1(ompi_mpi_init+0x4b3)[0x7ff8d102ceb3]
>>>>>>>>>> [JARVICE:08398] [ 8]
>>>>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1(PMPI_Init_thread+0x100)[0x7ff8d1050cb0]
>>>>>>>>>> [JARVICE:08398] [ 9] ./backend[0x404fdf]
>>>>>>>>>> [JARVICE:08398] [10]
>>>>>>>>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7ff8cfeded1d]
>>>>>>>>>> [JARVICE:08398] [11] ./backend[0x402db9]
>>>>>>>>>> [JARVICE:08398] *** End of error message ***
>>>>>>>>>>
>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>> A requested component was not found, or was unable to be opened.
>>>>>>>>>> This
>>>>>>>>>> means that this component is either not installed or is unable to
>>>>>>>>>> be
>>>>>>>>>> used on your system (e.g., sometimes this means that shared
>>>>>>>>>> libraries
>>>>>>>>>> that the component requires are unable to be found/loaded).  Note
>>>>>>>>>> that
>>>>>>>>>> Open MPI stopped checking at the first component that it did not
>>>>>>>>>> find.
>>>>>>>>>>
>>>>>>>>>> Host:      JARVICE
>>>>>>>>>> Framework: mtl
>>>>>>>>>> Component: mxm
>>>>>>>>>>
>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>> mpirun noticed that process rank 0 with PID 8398 on node JARVICE
>>>>>>>>>> exited on signal 11 (Segmentation fault).
>>>>>>>>>>
>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Subhra.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Apr 10, 2015 at 12:12 AM, Mike Dubman <
>>>>>>>>>> mi...@dev.mellanox.co.il> wrote:
>>>>>>>>>>
>>>>>>>>>>> no need IPoIB, mxm uses native IB.
>>>>>>>>>>>
>>>>>>>>>>> Please see HPCX (pre-compiled ompi, integrated with MXM and FCA)
>>>>>>>>>>> README file for details how to compile/select.
>>>>>>>>>>>
>>>>>>>>>>> The default transport is UD for internode communication and
>>>>>>>>>>> shared-memory for intra-node.
>>>>>>>>>>>
>>>>>>>>>>> http://bgate,mellanox.com/products/hpcx/
>>>>>>>>>>>
>>>>>>>>>>> Also, mxm included in the Mellanox OFED.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Apr 10, 2015 at 5:26 AM, Subhra Mazumdar <
>>>>>>>>>>> subhramazumd...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Does ipoib need to be configured on the ib cards for mxm (I
>>>>>>>>>>>> have a separate ethernet connection too)? Also are there special 
>>>>>>>>>>>> flags in
>>>>>>>>>>>> mpirun to select from UD/RC/DC? What is the default?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Subhra.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Mar 31, 2015 at 9:46 AM, Mike Dubman <
>>>>>>>>>>>> mi...@dev.mellanox.co.il> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> mxm uses IB rdma/roce technologies. Once can select UD/RC/DC
>>>>>>>>>>>>> transports to be used in mxm.
>>>>>>>>>>>>>
>>>>>>>>>>>>> By selecting mxm, all MPI p2p routines will be mapped to
>>>>>>>>>>>>> appropriate mxm functions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> M
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Mar 30, 2015 at 7:32 PM, Subhra Mazumdar <
>>>>>>>>>>>>> subhramazumd...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi MIke,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does the mxm mtl use infiniband rdma? Also from programming
>>>>>>>>>>>>>> perspective, do I need to use anything else other than 
>>>>>>>>>>>>>> MPI_Send/MPI_Recv?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Subhra.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Mar 29, 2015 at 11:14 PM, Mike Dubman <
>>>>>>>>>>>>>> mi...@dev.mellanox.co.il> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> openib btl does not support this thread model.
>>>>>>>>>>>>>>> You can use OMPI w/ mxm (-mca mtl mxm) and multiple thread
>>>>>>>>>>>>>>> mode lin 1.8 x series or (-mca pml yalla) in the master branch.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> M
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Mar 30, 2015 at 9:09 AM, Subhra Mazumdar <
>>>>>>>>>>>>>>> subhramazumd...@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can MPI_THREAD_MULTIPLE and openib btl work together in
>>>>>>>>>>>>>>>> open mpi 1.8.4? If so are there any command line options 
>>>>>>>>>>>>>>>> needed during run
>>>>>>>>>>>>>>>> time?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Subhra.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>> Subscription:
>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>> Link to this post:
>>>>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26574.php
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Kind Regards,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> M.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>> Subscription:
>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>> Link to this post:
>>>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26575.php
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>> Subscription:
>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>> Link to this post:
>>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26580.php
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>>
>>>>>>>>>>>>> Kind Regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> M.
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>> Subscription:
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>> Link to this post:
>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26584.php
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>> Subscription:
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>> Link to this post:
>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26663.php
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> Kind Regards,
>>>>>>>>>>>
>>>>>>>>>>> M.
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>> Link to this post:
>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26665.php
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>> Link to this post:
>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26686.php
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Kind Regards,
>>>>>>>>>
>>>>>>>>> M.
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> Link to this post:
>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26688.php
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> Link to this post:
>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26711.php
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Kind Regards,
>>>>>>>
>>>>>>> M.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26712.php
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26752.php
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Kind Regards,
>>>>>
>>>>> M.
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26754.php
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2015/04/26761.php
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Kind Regards,
>>>
>>> M.
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2015/04/26762.php
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/04/26766.php
>>
>
>
>
> --
>
> Kind Regards,
>
> M.
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/04/26768.php
>

Reply via email to