Hi,

where is the command ofed_info located? I searched from / but didn't find
it.

Subhra.

On Tue, Apr 21, 2015 at 10:43 PM, Mike Dubman <mi...@dev.mellanox.co.il>
wrote:

> cool, progress!
>
> >>1429676565.124664]         sys.c:719  MXM  WARN  Conflicting CPU
> frequencies detected, using: 2601.00
>
> means that cpu governor on your machine is not on "performance" mode
>
> >> MXM  ERROR ibv_query_device() returned 38: Function not implemented
>
> indicates that ofed installed on your nodes is not indeed 2.4.-1.0.0 or
> there is a mismatch between ofed kernel drivers version and ofed userspace
> libraries version.
> or you have multiple ofed libraries installed on your node and use
> incorrect one.
> could you please check that ofed_info -s indeed prints mofed 2.4-1.0.0?
>
>
>
>
>
> On Wed, Apr 22, 2015 at 7:59 AM, Subhra Mazumdar <
> subhramazumd...@gmail.com> wrote:
>
>> Hi,
>>
>> I compiled the openmpi that comes inside the mellanox hpcx package with
>> mxm support instead of separately downloaded openmpi. I also used the
>> environment as in the README so that no LD_PRELOAD (except our own library
>> which is unrelated) is needed. Now it runs fine (no segfault) but we get
>> same errors as before (saying initialization of MXM library failed). Is it
>> using MXM successfully?
>>
>> [root@JARVICE
>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5]# mpirun
>> --allow-run-as-root  --mca mtl mxm -n 1 /root/backend  localhost : -x
>> LD_PRELOAD=/root/libci.so -n 1 /root/app2
>> --------------------------------------------------------------------------
>> WARNING: a request was made to bind a process. While the system
>> supports binding the process itself, at least one node does NOT
>> support binding memory to the process location.
>>
>>   Node:  JARVICE
>>
>> This usually is due to not having the required NUMA support installed
>> on the node. In some Linux distributions, the required support is
>> contained in the libnumactl and libnumactl-devel packages.
>> This is a warning only; your job will continue, though performance may be
>> degraded.
>> --------------------------------------------------------------------------
>>  i am backend
>> [1429676565.121218]         sys.c:719  MXM  WARN  Conflicting CPU
>> frequencies detected, using: 2601.00
>> [1429676565.122937] [JARVICE:14767:0]      ib_dev.c:445  MXM  WARN
>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>> [1429676565.122950] [JARVICE:14767:0]      ib_dev.c:456  MXM  ERROR
>> ibv_query_device() returned 38: Function not implemented
>> [1429676565.123535] [JARVICE:14767:0]      ib_dev.c:445  MXM  WARN
>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>> [1429676565.123543] [JARVICE:14767:0]      ib_dev.c:456  MXM  ERROR
>> ibv_query_device() returned 38: Function not implemented
>> [1429676565.124664]         sys.c:719  MXM  WARN  Conflicting CPU
>> frequencies detected, using: 2601.00
>> [1429676565.126264] [JARVICE:14768:0]      ib_dev.c:445  MXM  WARN
>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>> [1429676565.126276] [JARVICE:14768:0]      ib_dev.c:456  MXM  ERROR
>> ibv_query_device() returned 38: Function not implemented
>> [1429676565.126812] [JARVICE:14768:0]      ib_dev.c:445  MXM  WARN
>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>> [1429676565.126821] [JARVICE:14768:0]      ib_dev.c:456  MXM  ERROR
>> ibv_query_device() returned 38: Function not implemented
>> --------------------------------------------------------------------------
>> Initialization of MXM library failed.
>>
>>   Error: Input/output error
>>
>> --------------------------------------------------------------------------
>>
>> <application runs fine>
>>
>>
>> Thanks,
>> Subhra.
>>
>>
>> On Sat, Apr 18, 2015 at 12:28 AM, Mike Dubman <mi...@dev.mellanox.co.il>
>> wrote:
>>
>>> could you please check that ofed_info -s indeed prints mofed 2.4-1.0.0?
>>> why LD_PRELOAD needed in your command line? Can you try
>>>
>>> module load hpcx
>>> mpirun -np $np test.exe
>>> ?
>>>
>>> On Sat, Apr 18, 2015 at 8:39 AM, Subhra Mazumdar <
>>> subhramazumd...@gmail.com> wrote:
>>>
>>>> I followed the instructions as in the README, now getting a different
>>>> error:
>>>>
>>>> [root@JARVICE
>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5]#
>>>> ../openmpi-1.8.4/openmpinstall/bin/mpirun --allow-run-as-root --mca mtl mxm
>>>> -x LD_PRELOAD="../openmpi-1.8.4/openmpinstall/lib/libmpi.so.1
>>>> ./mxm/lib/libmxm.so.2" -n 1 ../backend localhost : -x
>>>> LD_PRELOAD="../openmpi-1.8.4/openmpinstall/lib/libmpi.so.1
>>>> ./mxm/lib/libmxm.so.2 ../libci.so" -n 1 ../app2
>>>>
>>>>
>>>> --------------------------------------------------------------------------
>>>>
>>>> WARNING: a request was made to bind a process. While the system
>>>>
>>>> supports binding the process itself, at least one node does NOT
>>>>
>>>> support binding memory to the process location.
>>>>
>>>>  Node:  JARVICE
>>>>
>>>> This usually is due to not having the required NUMA support installed
>>>>
>>>> on the node. In some Linux distributions, the required support is
>>>>
>>>> contained in the libnumactl and libnumactl-devel packages.
>>>>
>>>> This is a warning only; your job will continue, though performance may
>>>> be degraded.
>>>>
>>>>
>>>> --------------------------------------------------------------------------
>>>>
>>>> i am backend
>>>>
>>>> [1429334876.139452] [JARVICE:449  :0]   ib_dev.c:445  MXM  WARN
>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>>
>>>> [1429334876.139464] [JARVICE:449  :0]   ib_dev.c:456  MXM  ERROR
>>>> ibv_query_device() returned 38: Function not implemented
>>>>
>>>> [1429334876.139982] [JARVICE:449  :0]   ib_dev.c:445  MXM  WARN
>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>>
>>>> [1429334876.139990] [JARVICE:449  :0]   ib_dev.c:456  MXM  ERROR
>>>> ibv_query_device() returned 38: Function not implemented
>>>>
>>>> [1429334876.142649] [JARVICE:450  :0]   ib_dev.c:445  MXM  WARN
>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>>
>>>> [1429334876.142666] [JARVICE:450  :0]   ib_dev.c:456  MXM  ERROR
>>>> ibv_query_device() returned 38: Function not implemented
>>>>
>>>> [1429334876.143235] [JARVICE:450  :0]   ib_dev.c:445  MXM  WARN
>>>>  failed call to ibv_exp_use_priv_env(): Function not implemented
>>>>
>>>> [1429334876.143243] [JARVICE:450  :0]   ib_dev.c:456  MXM  ERROR
>>>> ibv_query_device() returned 38: Function not implemented
>>>>
>>>>
>>>> --------------------------------------------------------------------------
>>>>
>>>> Initialization of MXM library failed.
>>>>
>>>>  Error: Input/output error
>>>>
>>>>
>>>> --------------------------------------------------------------------------
>>>>
>>>> [JARVICE:449  :0] Caught signal 11 (Segmentation fault)
>>>>
>>>> [JARVICE:450  :0] Caught signal 11 (Segmentation fault)
>>>>
>>>> ==== backtrace ====
>>>>
>>>> 2 0x000000000005640c mxm_handle_error()
>>>>  
>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:641
>>>>
>>>> 3 0x000000000005657c mxm_error_signal_handler()
>>>>  
>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:616
>>>>
>>>> 4 0x00000000000329a0 killpg()  ??:0
>>>>
>>>> 5 0x000000000004812c _IO_vfprintf()  ??:0
>>>>
>>>> 6 0x000000000006f6da vasprintf()  ??:0
>>>>
>>>> 7 0x0000000000059b3b opal_show_help_vstring()  ??:0
>>>>
>>>> 8 0x0000000000026630 orte_show_help()  ??:0
>>>>
>>>> 9 0x0000000000001a3f mca_bml_r2_add_procs()
>>>>  
>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/bml/r2/bml_r2.c:409
>>>>
>>>> 10 0x0000000000004475 mca_pml_ob1_add_procs()
>>>>  
>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/pml/ob1/pml_ob1.c:332
>>>>
>>>> 11 0x00000000000442f3 ompi_mpi_init()  ??:0
>>>>
>>>> 12 0x0000000000067cb0 PMPI_Init_thread()  ??:0
>>>>
>>>> 13 0x000000000000d0ca l_getLocalFromConfig()
>>>>  /root/rain_ib/interposer/libciutils.c:83
>>>>
>>>> 14 0x000000000000c7b4 __cudaRegisterFatBinary()
>>>>  /root/rain_ib/interposer/libci.c:4055
>>>>
>>>> 15 0x0000000000402b59
>>>> _ZL70__sti____cudaRegisterAll_39_tmpxft_00000703_00000000_6_app2_cpp1_ii_hwv()
>>>>  tmpxft_00000703_00000000-3_app2.cudafe1.cpp:0
>>>>
>>>> 16 0x0000000000402dd6 __do_global_ctors_aux()  crtstuff.c:0
>>>>
>>>> ===================
>>>>
>>>> ==== backtrace ====
>>>>
>>>> 2 0x000000000005640c mxm_handle_error()
>>>>  
>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:641
>>>>
>>>> 3 0x000000000005657c mxm_error_signal_handler()
>>>>  
>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:616
>>>>
>>>> 4 0x00000000000329a0 killpg()  ??:0
>>>>
>>>> 5 0x000000000004812c _IO_vfprintf()  ??:0
>>>>
>>>> 6 0x000000000006f6da vasprintf()  ??:0
>>>>
>>>> 7 0x0000000000059b3b opal_show_help_vstring()  ??:0
>>>>
>>>> 8 0x0000000000026630 orte_show_help()  ??:0
>>>>
>>>> 9 0x0000000000001a3f mca_bml_r2_add_procs()
>>>>  
>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/bml/r2/bml_r2.c:409
>>>>
>>>> 10 0x0000000000004475 mca_pml_ob1_add_procs()
>>>>  
>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/pml/ob1/pml_ob1.c:332
>>>>
>>>> 11 0x00000000000442f3 ompi_mpi_init()  ??:0
>>>>
>>>> 12 0x0000000000067cb0 PMPI_Init_thread()  ??:0
>>>>
>>>> 13 0x0000000000404fdf main()  /root/rain_ib/backend/backend.c:1237
>>>>
>>>> 14 0x000000000001ed1d __libc_start_main()  ??:0
>>>>
>>>> 15 0x0000000000402db9 _start()  ??:0
>>>>
>>>> ===================
>>>>
>>>>
>>>> --------------------------------------------------------------------------
>>>>
>>>> mpirun noticed that process rank 1 with PID 450 on node JARVICE exited
>>>> on signal 11 (Segmentation fault).
>>>>
>>>>
>>>> --------------------------------------------------------------------------
>>>>
>>>> [JARVICE:00447] 1 more process has sent help message help-mtl-mxm.txt /
>>>> mxm init
>>>>
>>>> [JARVICE:00447] Set MCA parameter "orte_base_help_aggregate" to 0 to
>>>> see all help / error messages
>>>>
>>>> [root@JARVICE hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5]#
>>>>
>>>>
>>>> Subhra.
>>>>
>>>>
>>>> On Mon, Apr 13, 2015 at 10:58 PM, Mike Dubman <mi...@dev.mellanox.co.il
>>>> > wrote:
>>>>
>>>>> Have you followed installation steps from README (Also here for
>>>>> reference http://bgate.mellanox.com/products/hpcx/README.txt)
>>>>>
>>>>> ...
>>>>>
>>>>> * Load OpenMPI/OpenSHMEM v1.8 based package:
>>>>>
>>>>>     % source $HPCX_HOME/hpcx-init.sh
>>>>>     % hpcx_load
>>>>>     % env | grep HPCX
>>>>>     % mpirun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_usempi
>>>>>     % oshrun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_oshmem
>>>>>     % hpcx_unload
>>>>>
>>>>> 3. Load HPCX environment from modules
>>>>>
>>>>> * Load OpenMPI/OpenSHMEM based package:
>>>>>
>>>>>     % module use $HPCX_HOME/modulefiles
>>>>>     % module load hpcx
>>>>>     % mpirun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_c
>>>>>     % oshrun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_oshmem
>>>>>     % module unload hpcx
>>>>>
>>>>> ...
>>>>>
>>>>> On Tue, Apr 14, 2015 at 5:42 AM, Subhra Mazumdar <
>>>>> subhramazumd...@gmail.com> wrote:
>>>>>
>>>>>> I am using 2.4-1.0.0 mellanox ofed.
>>>>>>
>>>>>> I downloaded mofed tarball
>>>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5.tar and extracted
>>>>>> it. It has mxm directory.
>>>>>>
>>>>>> hpcx-v1.2.0-325-[root@JARVICE ~]# ls
>>>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5
>>>>>> archive      fca    hpcx-init-ompi-mellanox-v1.8.sh  ibprof
>>>>>> modulefiles  ompi-mellanox-v1.8  sources  VERSION
>>>>>> bupc-master  hcoll  hpcx-init.sh                     knem
>>>>>> mxm          README.txt          utils
>>>>>>
>>>>>> I tried using LD_PRELOAD for libmxm, but getting a different error
>>>>>> stack now as following
>>>>>>
>>>>>> [root@JARVICE ~]# ./openmpi-1.8.4/openmpinstall/bin/mpirun
>>>>>> --allow-run-as-root --mca mtl mxm -x
>>>>>> LD_PRELOAD="./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1
>>>>>> ./hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm/lib/libmxm.so.2"
>>>>>> -n 1 ./backend  localhost : -x
>>>>>> LD_PRELOAD="./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1
>>>>>> ./hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm/lib/libmxm.so.2
>>>>>> ./libci.so" -n 1 ./app2
>>>>>>  i am backend
>>>>>> [JARVICE:00564] mca: base: components_open: component pml / cm open
>>>>>> function failed
>>>>>> [JARVICE:564  :0] Caught signal 11 (Segmentation fault)
>>>>>> [JARVICE:00565] mca: base: components_open: component pml / cm open
>>>>>> function failed
>>>>>> [JARVICE:565  :0] Caught signal 11 (Segmentation fault)
>>>>>> ==== backtrace ====
>>>>>>  2 0x000000000005640c mxm_handle_error()
>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:641
>>>>>>  3 0x000000000005657c mxm_error_signal_handler()
>>>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:616
>>>>>>  4 0x00000000000329a0 killpg()  ??:0
>>>>>>  5 0x0000000000045491 mca_base_components_close()  ??:0
>>>>>>  6 0x000000000004e99a mca_base_framework_close()  ??:0
>>>>>>  7 0x0000000000045431 mca_base_component_close()  ??:0
>>>>>>  8 0x000000000004515c mca_base_framework_components_open()  ??:0
>>>>>>  9 0x00000000000a0de9 mca_pml_base_open()  pml_base_frame.c:0
>>>>>> 10 0x000000000004eb1c mca_base_framework_open()  ??:0
>>>>>> 11 0x0000000000043eb3 ompi_mpi_init()  ??:0
>>>>>> 12 0x0000000000067cb0 PMPI_Init_thread()  ??:0
>>>>>> 13 0x0000000000404fdf main()  /root/rain_ib/backend/backend.c:1237
>>>>>> 14 0x000000000001ed1d __libc_start_main()  ??:0
>>>>>> 15 0x0000000000402db9 _start()  ??:0
>>>>>> ===================
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>> A requested component was not found, or was unable to be opened.  This
>>>>>> means that this component is either not installed or is unable to be
>>>>>> used on your system (e.g., sometimes this means that shared libraries
>>>>>> that the component requires are unable to be found/loaded).  Note that
>>>>>> Open MPI stopped checking at the first component that it did not find.
>>>>>>
>>>>>> Host:      JARVICE
>>>>>> Framework: mtl
>>>>>> Component: mxm
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>> mpirun noticed that process rank 0 with PID 564 on node JARVICE
>>>>>> exited on signal 11 (Segmentation fault).
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>> [JARVICE:00562] 1 more process has sent help message
>>>>>> help-mca-base.txt / find-available:not-valid
>>>>>> [JARVICE:00562] Set MCA parameter "orte_base_help_aggregate" to 0 to
>>>>>> see all help / error messages
>>>>>>
>>>>>>
>>>>>> Subhra
>>>>>>
>>>>>>
>>>>>> On Sun, Apr 12, 2015 at 10:48 PM, Mike Dubman <
>>>>>> mi...@dev.mellanox.co.il> wrote:
>>>>>>
>>>>>>> seems like mxm was not found in your ld_library_path.
>>>>>>>
>>>>>>> what mofed version do you use?
>>>>>>> does it have /opt/mellanox/mxm in it?
>>>>>>> You could just run mpirun from HPCX package which looks for mxm
>>>>>>> internally and recompile ompi as mentioned in README.
>>>>>>>
>>>>>>> On Mon, Apr 13, 2015 at 3:24 AM, Subhra Mazumdar <
>>>>>>> subhramazumd...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I used mxm mtl as follows but getting segfault. It says mxm
>>>>>>>> component not found but I have compiled openmpi with mxm. Any idea 
>>>>>>>> what I
>>>>>>>> might be missing?
>>>>>>>>
>>>>>>>> [root@JARVICE ~]# ./openmpi-1.8.4/openmpinstall/bin/mpirun
>>>>>>>> --allow-run-as-root --mca pml cm --mca mtl mxm -n 1 -x
>>>>>>>> LD_PRELOAD=./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1 ./backend
>>>>>>>> localhosst : -n 1 -x LD_PRELOAD="./libci.so
>>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1" ./app2
>>>>>>>>  i am backend
>>>>>>>> [JARVICE:08398] *** Process received signal ***
>>>>>>>> [JARVICE:08398] Signal: Segmentation fault (11)
>>>>>>>> [JARVICE:08398] Signal code: Address not mapped (1)
>>>>>>>> [JARVICE:08398] Failing at address: 0x10
>>>>>>>> [JARVICE:08398] [ 0] /lib64/libpthread.so.0(+0xf710)[0x7ff8d0ddb710]
>>>>>>>> [JARVICE:08398] [ 1]
>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_components_close+0x21)[0x7ff8cf9ae491]
>>>>>>>> [JARVICE:08398] [ 2]
>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_framework_close+0x6a)[0x7ff8cf9b799a]
>>>>>>>> [JARVICE:08398] [ 3]
>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_component_close+0x21)[0x7ff8cf9ae431]
>>>>>>>> [JARVICE:08398] [ 4]
>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_framework_components_open+0x11c)[0x7ff8cf9ae15c]
>>>>>>>> [JARVICE:08398] [ 5]
>>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1(+0xa0de9)[0x7ff8d1089de9]
>>>>>>>> [JARVICE:08398] [ 6]
>>>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7ff8cf9b7b1c]
>>>>>>>> [JARVICE:08398] [ 7] [JARVICE:08398] mca: base: components_open:
>>>>>>>> component pml / cm open function failed
>>>>>>>>
>>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1(ompi_mpi_init+0x4b3)[0x7ff8d102ceb3]
>>>>>>>> [JARVICE:08398] [ 8]
>>>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1(PMPI_Init_thread+0x100)[0x7ff8d1050cb0]
>>>>>>>> [JARVICE:08398] [ 9] ./backend[0x404fdf]
>>>>>>>> [JARVICE:08398] [10]
>>>>>>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7ff8cfeded1d]
>>>>>>>> [JARVICE:08398] [11] ./backend[0x402db9]
>>>>>>>> [JARVICE:08398] *** End of error message ***
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> A requested component was not found, or was unable to be opened.
>>>>>>>> This
>>>>>>>> means that this component is either not installed or is unable to be
>>>>>>>> used on your system (e.g., sometimes this means that shared
>>>>>>>> libraries
>>>>>>>> that the component requires are unable to be found/loaded).  Note
>>>>>>>> that
>>>>>>>> Open MPI stopped checking at the first component that it did not
>>>>>>>> find.
>>>>>>>>
>>>>>>>> Host:      JARVICE
>>>>>>>> Framework: mtl
>>>>>>>> Component: mxm
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> mpirun noticed that process rank 0 with PID 8398 on node JARVICE
>>>>>>>> exited on signal 11 (Segmentation fault).
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> Subhra.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Apr 10, 2015 at 12:12 AM, Mike Dubman <
>>>>>>>> mi...@dev.mellanox.co.il> wrote:
>>>>>>>>
>>>>>>>>> no need IPoIB, mxm uses native IB.
>>>>>>>>>
>>>>>>>>> Please see HPCX (pre-compiled ompi, integrated with MXM and FCA)
>>>>>>>>> README file for details how to compile/select.
>>>>>>>>>
>>>>>>>>> The default transport is UD for internode communication and
>>>>>>>>> shared-memory for intra-node.
>>>>>>>>>
>>>>>>>>> http://bgate,mellanox.com/products/hpcx/
>>>>>>>>>
>>>>>>>>> Also, mxm included in the Mellanox OFED.
>>>>>>>>>
>>>>>>>>> On Fri, Apr 10, 2015 at 5:26 AM, Subhra Mazumdar <
>>>>>>>>> subhramazumd...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Does ipoib need to be configured on the ib cards for mxm (I have
>>>>>>>>>> a separate ethernet connection too)? Also are there special flags in 
>>>>>>>>>> mpirun
>>>>>>>>>> to select from UD/RC/DC? What is the default?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Subhra.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 31, 2015 at 9:46 AM, Mike Dubman <
>>>>>>>>>> mi...@dev.mellanox.co.il> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>> mxm uses IB rdma/roce technologies. Once can select UD/RC/DC
>>>>>>>>>>> transports to be used in mxm.
>>>>>>>>>>>
>>>>>>>>>>> By selecting mxm, all MPI p2p routines will be mapped to
>>>>>>>>>>> appropriate mxm functions.
>>>>>>>>>>>
>>>>>>>>>>> M
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Mar 30, 2015 at 7:32 PM, Subhra Mazumdar <
>>>>>>>>>>> subhramazumd...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi MIke,
>>>>>>>>>>>>
>>>>>>>>>>>> Does the mxm mtl use infiniband rdma? Also from programming
>>>>>>>>>>>> perspective, do I need to use anything else other than 
>>>>>>>>>>>> MPI_Send/MPI_Recv?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Subhra.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Mar 29, 2015 at 11:14 PM, Mike Dubman <
>>>>>>>>>>>> mi...@dev.mellanox.co.il> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> openib btl does not support this thread model.
>>>>>>>>>>>>> You can use OMPI w/ mxm (-mca mtl mxm) and multiple thread
>>>>>>>>>>>>> mode lin 1.8 x series or (-mca pml yalla) in the master branch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> M
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Mar 30, 2015 at 9:09 AM, Subhra Mazumdar <
>>>>>>>>>>>>> subhramazumd...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can MPI_THREAD_MULTIPLE and openib btl work together in open
>>>>>>>>>>>>>> mpi 1.8.4? If so are there any command line options needed 
>>>>>>>>>>>>>> during run time?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Subhra.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>> Subscription:
>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>> Link to this post:
>>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26574.php
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>>
>>>>>>>>>>>>> Kind Regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> M.
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>> Subscription:
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>> Link to this post:
>>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26575.php
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>> Subscription:
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>> Link to this post:
>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26580.php
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> Kind Regards,
>>>>>>>>>>>
>>>>>>>>>>> M.
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>> Link to this post:
>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26584.php
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>> Link to this post:
>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26663.php
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Kind Regards,
>>>>>>>>>
>>>>>>>>> M.
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> Link to this post:
>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26665.php
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> Link to this post:
>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26686.php
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Kind Regards,
>>>>>>>
>>>>>>> M.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post:
>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26688.php
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26711.php
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Kind Regards,
>>>>>
>>>>> M.
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26712.php
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2015/04/26752.php
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Kind Regards,
>>>
>>> M.
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2015/04/26754.php
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/04/26761.php
>>
>
>
>
> --
>
> Kind Regards,
>
> M.
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/04/26762.php
>

Reply via email to