HPCX package uses pml "yalla" by default (part of ompi master branch, not
in v1.8).
So, "-mca mtl mxm" has no effect, unless "-mca pml cm" specified to disable
"pml yalla" and let mtl  layer to play.



On Fri, Apr 24, 2015 at 6:36 AM, Subhra Mazumdar <subhramazumd...@gmail.com>
wrote:

> I changed my downloaded MOFED version to match the one installed on the
> node and now the error goes away and it runs fine. But I still have a
> question, I get the exact same performance on all the below 3 cases:
>
> 1) mpirun --allow-run-as-root  --mca mtl mxm -mca mtl_mxm_np 0 -x
> MXM_TLS=self,shm,rc,ud -n 1 /root/backend  localhost : -x
> LD_PRELOAD=/root/libci.so -n 1 /root/app2
>
> 2) mpirun --allow-run-as-root  --mca mtl mxm -n 1 /root/backend  localhost
> : -x LD_PRELOAD=/root/libci.so -n 1 /root/app2
>
> 3) mpirun --allow-run-as-root  --mca mtl ^mxm -n 1 /root/backend
>  localhost : -x LD_PRELOAD=/root/libci.so -n 1 /root/app2
>
> Seems like it doesn't matter if I use mxm, not use mxm or use it with
> reliable connection (RC). How can I be sure I am indeed using mxm over
> infiniband?
>
> Thanks,
> Subhra.
>
>
>
>
>
> On Thu, Apr 23, 2015 at 1:06 AM, Mike Dubman <mi...@dev.mellanox.co.il>
> wrote:
>
>> /usr/bin/ofed_info
>>
>> So, the OFED on your system is not MellanoxOFED 2.4.x but smth else.
>>
>> try #rpm -qi libibverbs
>>
>>
>> On Thu, Apr 23, 2015 at 7:47 AM, Subhra Mazumdar <
>> subhramazumd...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> where is the command ofed_info located? I searched from / but didn't
>>> find it.
>>>
>>> Subhra.
>>>
>>> On Tue, Apr 21, 2015 at 10:43 PM, Mike Dubman <mi...@dev.mellanox.co.il>
>>> wrote:
>>>
>>>> cool, progress!
>>>>
>>>> >>1429676565.124664]         sys.c:719  MXM  WARN  Conflicting CPU
>>>> frequencies detected, using: 2601.00
>>>>
>>>> means that cpu governor on your machine is not on "performance" mode
>>>>
>>>> >> MXM  ERROR ibv_query_device() returned 38: Function not implemented
>>>>
>>>> indicates that ofed installed on your nodes is not indeed 2.4.-1.0.0 or
>>>> there is a mismatch between ofed kernel drivers version and ofed userspace
>>>> libraries version.
>>>> or you have multiple ofed libraries installed on your node and use
>>>> incorrect one.
>>>> could you please check that ofed_info -s indeed prints mofed 2.4-1.0.0?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Apr 22, 2015 at 7:59 AM, Subhra Mazumdar <
>>>> subhramazumd...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I compiled the openmpi that comes inside the mellanox hpcx package
>>>>> with mxm support instead of separately downloaded openmpi. I also used the
>>>>> environment as in the README so that no LD_PRELOAD (except our own library
>>>>> which is unrelated) is needed. Now it runs fine (no segfault) but we get
>>>>> same errors as before (saying initialization of MXM library failed). Is it
>>>>> using MXM successfully?
>>>>>
>>>>> [root@JARVICE
>>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5]# mpirun
>>>>> --allow-run-as-root  --mca mtl mxm -n 1 /root/backend  localhost : -x
>>>>> LD_PRELOAD=/root/libci.so -n 1 /root/app2
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> WARNING: a request was made to bind a process. While the system
>>>>> supports binding the process itself, at least one node does NOT
>>>>> support binding memory to the process location.
>>>>>
>>>>>   Node:  JARVICE
>>>>>
>>>>> This usually is due to not having the required NUMA support installed
>>>>> on the node. In some Linux distributions, the required support is
>>>>> contained in the libnumactl and libnumactl-devel packages.
>>>>> This is a warning only; your job will continue, though performance may
>>>>> be degraded.
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>  i am backend
>>>>> [1429676565.121218]         sys.c:719  MXM  WARN  Conflicting CPU
>>>>> frequencies detected, using: 2601.00
>>>>> [1429676565.122937] [JARVICE:14767:0]      ib_dev.c:445  MXM  WARN
>>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>>> [1429676565.122950] [JARVICE:14767:0]      ib_dev.c:456  MXM  ERROR
>>>>> ibv_query_device() returned 38: Function not implemented
>>>>> [1429676565.123535] [JARVICE:14767:0]      ib_dev.c:445  MXM  WARN
>>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>>> [1429676565.123543] [JARVICE:14767:0]      ib_dev.c:456  MXM  ERROR
>>>>> ibv_query_device() returned 38: Function not implemented
>>>>> [1429676565.124664]         sys.c:719  MXM  WARN  Conflicting CPU
>>>>> frequencies detected, using: 2601.00
>>>>> [1429676565.126264] [JARVICE:14768:0]      ib_dev.c:445  MXM  WARN
>>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>>> [1429676565.126276] [JARVICE:14768:0]      ib_dev.c:456  MXM  ERROR
>>>>> ibv_query_device() returned 38: Function not implemented
>>>>> [1429676565.126812] [JARVICE:14768:0]      ib_dev.c:445  MXM  WARN
>>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>>> [1429676565.126821] [JARVICE:14768:0]      ib_dev.c:456  MXM  ERROR
>>>>> ibv_query_device() returned 38: Function not implemented
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> Initialization of MXM library failed.
>>>>>
>>>>>   Error: Input/output error
>>>>>
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> <application runs fine>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Subhra.
>>>>>
>>>>>
>>>>> On Sat, Apr 18, 2015 at 12:28 AM, Mike Dubman <
>>>>> mi...@dev.mellanox.co.il> wrote:
>>>>>
>>>>>> could you please check that ofed_info -s indeed prints mofed
>>>>>> 2.4-1.0.0?
>>>>>> why LD_PRELOAD needed in your command line? Can you try
>>>>>>
>>>>>> module load hpcx
>>>>>> mpirun -np $np test.exe
>>>>>> ?
>>>>>>
>>>>>> On Sat, Apr 18, 2015 at 8:39 AM, Subhra Mazumdar <
>>>>>> subhramazumd...@gmail.com> wrote:
>>>>>>
>>>>>>> I followed the instructions as in the README, now getting a
>>>>>>> different error:
>>>>>>>
>>>>>>> [root@JARVICE
>>>>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5]#
>>>>>>> ../openmpi-1.8.4/openmpinstall/bin/mpirun --allow-run-as-root --mca mtl 
>>>>>>> mxm
>>>>>>> -x LD_PRELOAD="../openmpi-1.8.4/openmpinstall/lib/libmpi.so.1
>>>>>>> ./mxm/lib/libmxm.so.2" -n 1 ../backend localhost : -x
>>>>>>> LD_PRELOAD="../openmpi-1.8.4/openmpinstall/lib/libmpi.so.1
>>>>>>> ./mxm/lib/libmxm.so.2 ../libci.so" -n 1 ../app2
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>> WARNING: a request was made to bind a process. While the system
>>>>>>>
>>>>>>> supports binding the process itself, at least one node does NOT
>>>>>>>
>>>>>>> support binding memory to the process location.
>>>>>>>
>>>>>>>  Node:  JARVICE
>>>>>>>
>>>>>>> This usually is due to not having the required NUMA support installed
>>>>>>>
>>>>>>> on the node. In some Linux distributions, the required support is
>>>>>>>
>>>>>>> contained in the libnumactl and libnumactl-devel packages.
>>>>>>>
>>>>>>> This is a warning only; your job will continue, though performance
>>>>>>> may be degraded.
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>> i am backend
>>>>>>>
>>>>>>> [1429334876.139452] [JARVICE:449  :0]   ib_dev.c:445  MXM  WARN
>>>>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>>>>>
>>>>>>> [1429334876.139464] [JARVICE:449  :0]   ib_dev.c:456  MXM  ERROR
>>>>>>> ibv_query_device() returned 38: Function not implemented
>>>>>>>
>>>>>>> [1429334876.139982] [JARVICE:449  :0]   ib_dev.c:445  MXM  WARN
>>>>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>>>>>
>>>>>>> [1429334876.139990] [JARVICE:449  :0]   ib_dev.c:456  MXM  ERROR
>>>>>>> ibv_query_device() returned 38: Function not implemented
>>>>>>>
>>>>>>> [1429334876.142649] [JARVICE:450  :0]   ib_dev.c:445  MXM  WARN
>>>>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>>>>>
>>>>>>> [1429334876.142666] [JARVICE:450  :0]   ib_dev.c:456  MXM  ERROR
>>>>>>> ibv_query_device() returned 38: Function not implemented
>>>>>>>
>>>>>>> [1429334876.143235] [JARVICE:450  :0]   ib_dev.c:445  MXM  WARN
>>>>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>>>>>
>>>>>>> [1429334876.143243] [JARVICE:450  :0]   ib_dev.c:456  MXM  ERROR
>>>>>>> ibv_query_device() returned 38: Function not implemented
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>> Initialization of MXM library failed.
>>>>>>>
>>>>>>>  Error: Input/output error
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>> [JARVICE:449  :0] Caught signal 11 (Segmentation fault)
>>>>>>>
>>>>>>> [JARVICE:450  :0] Caught signal 11 (Segmentation fault)
>>>>>>>
>>>>>>> ==== backtrace ====
>>>>>>>
>>>>>>> 2 0x000000000005640c mxm_handle_error()
>>>>>>>  
>>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:641
>>>>>>>
>>>>>>> 3 0x000000000005657c mxm_error_signal_handler()
>>>>>>>  
>>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:616
>>>>>>>
>>>>>>> 4 0x00000000000329a0 killpg()  ??:0
>>>>>>>
>>>>>>> 5 0x000000000004812c _IO_vfprintf()  ??:0
>>>>>>>
>>>>>>> 6 0x000000000006f6da vasprintf()  ??:0
>>>>>>>
>>>>>>> 7 0x0000000000059b3b opal_show_help_vstring()  ??:0
>>>>>>>
>>>>>>> 8 0x0000000000026630 orte_show_help()  ??:0
>>>>>>>
>>>>>>> 9 0x0000000000001a3f mca_bml_r2_add_procs()
>>>>>>>  
>>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/bml/r2/bml_r2.c:409
>>>>>>>
>>>>>>> 10 0x0000000000004475 mca_pml_ob1_add_procs()
>>>>>>>  
>>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/pml/ob1/pml_ob1.c:332
>>>>>>>
>>>>>>> 11 0x00000000000442f3 ompi_mpi_init()  ??:0
>>>>>>>
>>>>>>> 12 0x0000000000067cb0 PMPI_Init_thread()  ??:0
>>>>>>>
>>>>>>> 13 0x000000000000d0ca l_getLocalFromConfig()
>>>>>>>  /root/rain_ib/interposer/libciutils.c:83
>>>>>>>
>>>>>>> 14 0x000000000000c7b4 __cudaRegisterFatBinary()
>>>>>>>  /root/rain_ib/interposer/libci.c:4055
>>>>>>>
>>>>>>> 15 0x0000000000402b59
>>>>>>> _ZL70__sti____cudaRegisterAll_39_tmpxft_00000703_00000000_6_app2_cpp1_ii_hwv()
>>>>>>>  tmpxft_00000703_00000000-3_app2.cudafe1.cpp:0
>>>>>>>
>>>>>>> 16 0x0000000000402dd6 __do_global_ctors_aux()  crtstuff.c:0
>>>>>>>
>>>>>>> ===================
>>>>>>>
>>>>>>> ==== backtrace ====
>>>>>>>
>>>>>>> 2 0x000000000005640c mxm_handle_error()
>>>>>>>  
>>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:641
>>>>>>>
>>>>>>> 3 0x000000000005657c mxm_error_signal_handler()
>>>>>>>  
>>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:616
>>>>>>>
>>>>>>> 4 0x00000000000329a0 killpg()  ??:0
>>>>>>>
>>>>>>> 5 0x000000000004812c _IO_vfprintf()  ??:0
>>>>>>>
>>>>>>> 6 0x000000000006f6da vasprintf()  ??:0
>>>>>>>
>>>>>>> 7 0x0000000000059b3b opal_show_help_vstring()  ??:0
>>>>>>>
>>>>>>> 8 0x0000000000026630 orte_show_help()  ??:0
>>>>>>>
>>>>>>> 9 0x0000000000001a3f mca_bml_r2_add_procs()
>>>>>>>  
>>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/bml/r2/bml_r2.c:409
>>>>>>>
>>>>>>> 10 0x0000000000004475 mca_pml_ob1_add_procs()
>>>>>>>  
>>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/pml/ob1/pml_ob1.c:332
>>>>>>>
>>>>>>> 11 0x00000000000442f3 ompi_mpi_init()  ??:0
>>>>>>>
>>>>>>> 12 0x0000000000067cb0 PMPI_Init_thread()  ??:0
>>>>>>>
>>>>>>> 13 0x0000000000404fdf main()  /root/rain_ib/backend/backend.c:1237
>>>>>>>
>>>>>>> 14 0x000000000001ed1d __libc_start_main()  ??:0
>>>>>>>
>>>>>>> 15 0x0000000000402db9 _start()  ??:0
>>>>>>>
>>>>>>> ===================
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>> mpirun noticed that process rank 1 with PID 450 on node JARVICE
>>>>>>> exited on signal 11 (Segmentation fault).
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>>
>>>>>>> [JARVICE:00447] 1 more process has sent help message
>>>>>>> help-mtl-mxm.txt / mxm init
>>>>>>>
>>>>>>> [JARVICE:00447] Set MCA parameter "orte_base_help_aggregate" to 0 to
>>>>>>> see all help / error messages
>>>>>>>
>>>>>>> [root@JARVICE
>>>>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5]#
>>>>>>>
>>>>>>>
>>>>>>> Subhra.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Apr 13, 2015 at 10:58 PM, Mike Dubman <
>>>>>>> mi...@dev.mellanox.co.il> wrote:
>>>>>>>
>>>>>>>> Have you followed installation steps from README (Also here for
>>>>>>>> reference http://bgate.mellanox.com/products/hpcx/README.txt)
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>> * Load OpenMPI/OpenSHMEM v1.8 based package:
>>>>>>>>
>>>>>>>>     % source $HPCX_HOME/hpcx-init.sh
>>>>>>>>     % hpcx_load
>>>>>>>>     % env | grep HPCX
>>>>>>>>     % mpirun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_usempi
>>>>>>>>     % oshrun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_oshmem
>>>>>>>>     % hpcx_unload
>>>>>>>>
>>>>>>>> 3. Load HPCX environment from modules
>>>>>>>>
>>>>>>>> * Load OpenMPI/OpenSHMEM based package:
>>>>>>>>
>>>>>>>>     % module use $HPCX_HOME/modulefiles
>>>>>>>>     % module load hpcx
>>>>>>>>     % mpirun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_c
>>>>>>>>     % oshrun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_oshmem
>>>>>>>>     % module unload hpcx
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>> On Tue, Apr 14, 2015 at 5:42 AM, Subhra Mazumdar <
>>>>>>>> subhramazumd...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I am using 2.4-1.0.0 mellanox ofed.
>>>>>>>>>
>>>>>>>>> I downloaded mofed tarball
>>>>>>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5.tar and 
>>>>>>>>> extracted
>>>>>>>>> it. It has mxm directory.
>>>>>>>>>
>>>>>>>>> hpcx-v1.2.0-325-[root@JARVICE ~]# ls
>>>>>>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5
>>>>>>>>> archive      fca    hpcx-init-ompi-mellanox-v1.8.sh  ibprof
>>>>>>>>> modulefiles  ompi-mellanox-v1.8  sources  VERSION
>>>>>>>>> bupc-master  hcoll  hpcx-init.sh                     knem
>>>>>>>>> mxm          README.txt          utils
>>>>>>>>>
>>>>>>>>> I tried using LD_PRELOAD for libmxm, but getting a different error
>>>>>>>>> stack now as following
>>>>>>>>>
>>>>>>>>> [root@JARVICE ~]# ./openmpi-1.8.4/openmpinstall/bin/mpirun
>>>>>>>>> --allow-run-as-root --mca mtl mxm -x
>>>>>>>>> LD_PRELOAD="./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1
>>>>>>>>> ./hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm/lib/libmxm.so.2"
>>>>>>>>> -n 1 ./backend  localhost : -x
>>>>>>>>> LD_PRELOAD="./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1
>>>>>>>>> ./hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm/lib/libmxm.so.2
>>>>>>>>> ./libci.so" -n 1 ./app2
>>>>>>>>>  i am backend
>>>>>>>>> [JARVICE:00564] mca: base: components_open: component pml / cm
>>>>>>>>> open function failed
>>>>>>>>> [JARVICE:564  :0] Caught signal 11 (Segmentation fault)
>>>>>>>>> [JARVICE:00565] mca: base: components_open: component pml / cm
>>>>>>>>> open function failed
>>>>>>>>> [JARVICE:565  :0] Caught signal 11 (Segmentation fault)
>>>>>>>>> ==== backtrace ====
>>>>>>>>>  2 0x000000000005640c mxm_handle_error()
>>>>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:641
>>>>>>>>>  3 0x000000000005657c mxm_error_signal_handler()
>>>>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:616
>>>>>>>>>  4 0x00000000000329a0 killpg()  ??:0
>>>>>>>>>  5 0x0000000000045491 mca_base_components_close()  ??:0
>>>>>>>>>  6 0x000000000004e99a mca_base_framework_close()  ??:0
>>>>>>>>>  7 0x0000000000045431 mca_base_component_close()  ??:0
>>>>>>>>>  8 0x000000000004515c mca_base_framework_components_open()  ??:0
>>>>>>>>>  9 0x00000000000a0de9 mca_pml_base_open()  pml_base_frame.c:0
>>>>>>>>> 10 0x000000000004eb1c mca_base_framework_open()  ??:0
>>>>>>>>> 11 0x0000000000043eb3 ompi_mpi_init()  ??:0
>>>>>>>>> 12 0x0000000000067cb0 PMPI_Init_thread()  ??:0
>>>>>>>>> 13 0x0000000000404fdf main()  /root/rain_ib/backend/backend.c:1237
>>>>>>>>> 14 0x000000000001ed1d __libc_start_main()  ??:0
>>>>>>>>> 15 0x0000000000402db9 _start()  ??:0
>>>>>>>>> ===================
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>> A requested component was not found, or was unable to be opened.
>>>>>>>>> This
>>>>>>>>> means that this component is either not installed or is unable to
>>>>>>>>> be
>>>>>>>>> used on your system (e.g., sometimes this means that shared
>>>>>>>>> libraries
>>>>>>>>> that the component requires are unable to be found/loaded).  Note
>>>>>>>>> that
>>>>>>>>> Open MPI stopped checking at the first component that it did not
>>>>>>>>> find.
>>>>>>>>>
>>>>>>>>> Host:      JARVICE
>>>>>>>>> Framework: mtl
>>>>>>>>> Component: mxm
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>> mpirun noticed that process rank 0 with PID 564 on node JARVICE
>>>>>>>>> exited on signal 11 (Segmentation fault).
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>> [JARVICE:00562] 1 more process has sent help message
>>>>>>>>> help-mca-base.txt / find-available:not-valid
>>>>>>>>> [JARVICE:00562] Set MCA parameter "orte_base_help_aggregate" to 0
>>>>>>>>> to see all help / error messages
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Subhra
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Apr 12, 2015 at 10:48 PM, Mike Dubman <
>>>>>>>>> mi...@dev.mellanox.co.il> wrote:
>>>>>>>>>
>>>>>>>>>> seems like mxm was not found in your ld_library_path.
>>>>>>>>>>
>>>>>>>>>> what mofed version do you use?
>>>>>>>>>> does it have /opt/mellanox/mxm in it?
>>>>>>>>>> You could just run mpirun from HPCX package which looks for mxm
>>>>>>>>>> internally and recompile ompi as mentioned in README.
>>>>>>>>>>
>>>>>>>>>> On Mon, Apr 13, 2015 at 3:24 AM, Subhra Mazumdar <
>>>>>>>>>> subhramazumd...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I used mxm mtl as follows but getting segfault. It says mxm
>>>>>>>>>>> component not found but I have compiled openmpi with mxm. Any idea 
>>>>>>>>>>> what I
>>>>>>>>>>> might be missing?
>>>>>>>>>>>
>>>>>>>>>>> [root@JARVICE ~]# ./openmpi-1.8.4/openmpinstall/bin/mpirun
>>>>>>>>>>> --allow-run-as-root --mca pml cm --mca mtl mxm -n 1 -x
>>>>>>>>>>> LD_PRELOAD=./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1 ./backend
>>>>>>>>>>> localhosst : -n 1 -x LD_PRELOAD="./libci.so
>>>>>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1" ./app2
>>>>>>>>>>>  i am backend
>>>>>>>>>>> [JARVICE:08398] *** Process received signal ***
>>>>>>>>>>> [JARVICE:08398] Signal: Segmentation fault (11)
>>>>>>>>>>> [JARVICE:08398] Signal code: Address not mapped (1)
>>>>>>>>>>> [JARVICE:08398] Failing at address: 0x10
>>>>>>>>>>> [JARVICE:08398] [ 0]
>>>>>>>>>>> /lib64/libpthread.so.0(+0xf710)[0x7ff8d0ddb710]
>>>>>>>>>>> [JARVICE:08398] [ 1]
>>>>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_components_close+0x21)[0x7ff8cf9ae491]
>>>>>>>>>>> [JARVICE:08398] [ 2]
>>>>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_framework_close+0x6a)[0x7ff8cf9b799a]
>>>>>>>>>>> [JARVICE:08398] [ 3]
>>>>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_component_close+0x21)[0x7ff8cf9ae431]
>>>>>>>>>>> [JARVICE:08398] [ 4]
>>>>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_framework_components_open+0x11c)[0x7ff8cf9ae15c]
>>>>>>>>>>> [JARVICE:08398] [ 5]
>>>>>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1(+0xa0de9)[0x7ff8d1089de9]
>>>>>>>>>>> [JARVICE:08398] [ 6]
>>>>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7ff8cf9b7b1c]
>>>>>>>>>>> [JARVICE:08398] [ 7] [JARVICE:08398] mca: base: components_open:
>>>>>>>>>>> component pml / cm open function failed
>>>>>>>>>>>
>>>>>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1(ompi_mpi_init+0x4b3)[0x7ff8d102ceb3]
>>>>>>>>>>> [JARVICE:08398] [ 8]
>>>>>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1(PMPI_Init_thread+0x100)[0x7ff8d1050cb0]
>>>>>>>>>>> [JARVICE:08398] [ 9] ./backend[0x404fdf]
>>>>>>>>>>> [JARVICE:08398] [10]
>>>>>>>>>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7ff8cfeded1d]
>>>>>>>>>>> [JARVICE:08398] [11] ./backend[0x402db9]
>>>>>>>>>>> [JARVICE:08398] *** End of error message ***
>>>>>>>>>>>
>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>> A requested component was not found, or was unable to be
>>>>>>>>>>> opened.  This
>>>>>>>>>>> means that this component is either not installed or is unable
>>>>>>>>>>> to be
>>>>>>>>>>> used on your system (e.g., sometimes this means that shared
>>>>>>>>>>> libraries
>>>>>>>>>>> that the component requires are unable to be found/loaded).
>>>>>>>>>>> Note that
>>>>>>>>>>> Open MPI stopped checking at the first component that it did not
>>>>>>>>>>> find.
>>>>>>>>>>>
>>>>>>>>>>> Host:      JARVICE
>>>>>>>>>>> Framework: mtl
>>>>>>>>>>> Component: mxm
>>>>>>>>>>>
>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>
>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>> mpirun noticed that process rank 0 with PID 8398 on node JARVICE
>>>>>>>>>>> exited on signal 11 (Segmentation fault).
>>>>>>>>>>>
>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Subhra.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Apr 10, 2015 at 12:12 AM, Mike Dubman <
>>>>>>>>>>> mi...@dev.mellanox.co.il> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> no need IPoIB, mxm uses native IB.
>>>>>>>>>>>>
>>>>>>>>>>>> Please see HPCX (pre-compiled ompi, integrated with MXM and
>>>>>>>>>>>> FCA) README file for details how to compile/select.
>>>>>>>>>>>>
>>>>>>>>>>>> The default transport is UD for internode communication and
>>>>>>>>>>>> shared-memory for intra-node.
>>>>>>>>>>>>
>>>>>>>>>>>> http://bgate,mellanox.com/products/hpcx/
>>>>>>>>>>>>
>>>>>>>>>>>> Also, mxm included in the Mellanox OFED.
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Apr 10, 2015 at 5:26 AM, Subhra Mazumdar <
>>>>>>>>>>>> subhramazumd...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does ipoib need to be configured on the ib cards for mxm (I
>>>>>>>>>>>>> have a separate ethernet connection too)? Also are there special 
>>>>>>>>>>>>> flags in
>>>>>>>>>>>>> mpirun to select from UD/RC/DC? What is the default?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Subhra.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Mar 31, 2015 at 9:46 AM, Mike Dubman <
>>>>>>>>>>>>> mi...@dev.mellanox.co.il> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> mxm uses IB rdma/roce technologies. Once can select UD/RC/DC
>>>>>>>>>>>>>> transports to be used in mxm.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> By selecting mxm, all MPI p2p routines will be mapped to
>>>>>>>>>>>>>> appropriate mxm functions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> M
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Mar 30, 2015 at 7:32 PM, Subhra Mazumdar <
>>>>>>>>>>>>>> subhramazumd...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi MIke,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does the mxm mtl use infiniband rdma? Also from programming
>>>>>>>>>>>>>>> perspective, do I need to use anything else other than 
>>>>>>>>>>>>>>> MPI_Send/MPI_Recv?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Subhra.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Mar 29, 2015 at 11:14 PM, Mike Dubman <
>>>>>>>>>>>>>>> mi...@dev.mellanox.co.il> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>> openib btl does not support this thread model.
>>>>>>>>>>>>>>>> You can use OMPI w/ mxm (-mca mtl mxm) and multiple thread
>>>>>>>>>>>>>>>> mode lin 1.8 x series or (-mca pml yalla) in the master branch.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> M
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Mar 30, 2015 at 9:09 AM, Subhra Mazumdar <
>>>>>>>>>>>>>>>> subhramazumd...@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Can MPI_THREAD_MULTIPLE and openib btl work together in
>>>>>>>>>>>>>>>>> open mpi 1.8.4? If so are there any command line options 
>>>>>>>>>>>>>>>>> needed during run
>>>>>>>>>>>>>>>>> time?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Subhra.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>> Subscription:
>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>> Link to this post:
>>>>>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26574.php
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Kind Regards,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> M.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>> Subscription:
>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>> Link to this post:
>>>>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26575.php
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>> Subscription:
>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>> Link to this post:
>>>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26580.php
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Kind Regards,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> M.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>> Subscription:
>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>> Link to this post:
>>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26584.php
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>> Subscription:
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>> Link to this post:
>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26663.php
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> Kind Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> M.
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>> Subscription:
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>> Link to this post:
>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26665.php
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>> Link to this post:
>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26686.php
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> Kind Regards,
>>>>>>>>>>
>>>>>>>>>> M.
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>> Link to this post:
>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26688.php
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> Link to this post:
>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26711.php
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Kind Regards,
>>>>>>>>
>>>>>>>> M.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> Link to this post:
>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26712.php
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26752.php
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Kind Regards,
>>>>>>
>>>>>> M.
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26754.php
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26761.php
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Kind Regards,
>>>>
>>>> M.
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2015/04/26762.php
>>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2015/04/26766.php
>>>
>>
>>
>>
>> --
>>
>> Kind Regards,
>>
>> M.
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/04/26768.php
>>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/04/26777.php
>



-- 

Kind Regards,

M.

Reply via email to