cool, progress! >>1429676565.124664] sys.c:719 MXM WARN Conflicting CPU frequencies detected, using: 2601.00
means that cpu governor on your machine is not on "performance" mode >> MXM ERROR ibv_query_device() returned 38: Function not implemented indicates that ofed installed on your nodes is not indeed 2.4.-1.0.0 or there is a mismatch between ofed kernel drivers version and ofed userspace libraries version. or you have multiple ofed libraries installed on your node and use incorrect one. could you please check that ofed_info -s indeed prints mofed 2.4-1.0.0? On Wed, Apr 22, 2015 at 7:59 AM, Subhra Mazumdar <subhramazumd...@gmail.com> wrote: > Hi, > > I compiled the openmpi that comes inside the mellanox hpcx package with > mxm support instead of separately downloaded openmpi. I also used the > environment as in the README so that no LD_PRELOAD (except our own library > which is unrelated) is needed. Now it runs fine (no segfault) but we get > same errors as before (saying initialization of MXM library failed). Is it > using MXM successfully? > > [root@JARVICE > hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5]# mpirun > --allow-run-as-root --mca mtl mxm -n 1 /root/backend localhost : -x > LD_PRELOAD=/root/libci.so -n 1 /root/app2 > -------------------------------------------------------------------------- > WARNING: a request was made to bind a process. While the system > supports binding the process itself, at least one node does NOT > support binding memory to the process location. > > Node: JARVICE > > This usually is due to not having the required NUMA support installed > on the node. In some Linux distributions, the required support is > contained in the libnumactl and libnumactl-devel packages. > This is a warning only; your job will continue, though performance may be > degraded. > -------------------------------------------------------------------------- > i am backend > [1429676565.121218] sys.c:719 MXM WARN Conflicting CPU > frequencies detected, using: 2601.00 > [1429676565.122937] [JARVICE:14767:0] ib_dev.c:445 MXM WARN failed > call to ibv_exp_use_priv_env(): Function not implemented > [1429676565.122950] [JARVICE:14767:0] ib_dev.c:456 MXM ERROR > ibv_query_device() returned 38: Function not implemented > [1429676565.123535] [JARVICE:14767:0] ib_dev.c:445 MXM WARN failed > call to ibv_exp_use_priv_env(): Function not implemented > [1429676565.123543] [JARVICE:14767:0] ib_dev.c:456 MXM ERROR > ibv_query_device() returned 38: Function not implemented > [1429676565.124664] sys.c:719 MXM WARN Conflicting CPU > frequencies detected, using: 2601.00 > [1429676565.126264] [JARVICE:14768:0] ib_dev.c:445 MXM WARN failed > call to ibv_exp_use_priv_env(): Function not implemented > [1429676565.126276] [JARVICE:14768:0] ib_dev.c:456 MXM ERROR > ibv_query_device() returned 38: Function not implemented > [1429676565.126812] [JARVICE:14768:0] ib_dev.c:445 MXM WARN failed > call to ibv_exp_use_priv_env(): Function not implemented > [1429676565.126821] [JARVICE:14768:0] ib_dev.c:456 MXM ERROR > ibv_query_device() returned 38: Function not implemented > -------------------------------------------------------------------------- > Initialization of MXM library failed. > > Error: Input/output error > > -------------------------------------------------------------------------- > > <application runs fine> > > > Thanks, > Subhra. > > > On Sat, Apr 18, 2015 at 12:28 AM, Mike Dubman <mi...@dev.mellanox.co.il> > wrote: > >> could you please check that ofed_info -s indeed prints mofed 2.4-1.0.0? >> why LD_PRELOAD needed in your command line? Can you try >> >> module load hpcx >> mpirun -np $np test.exe >> ? >> >> On Sat, Apr 18, 2015 at 8:39 AM, Subhra Mazumdar < >> subhramazumd...@gmail.com> wrote: >> >>> I followed the instructions as in the README, now getting a different >>> error: >>> >>> [root@JARVICE hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5]# >>> ../openmpi-1.8.4/openmpinstall/bin/mpirun --allow-run-as-root --mca mtl mxm >>> -x LD_PRELOAD="../openmpi-1.8.4/openmpinstall/lib/libmpi.so.1 >>> ./mxm/lib/libmxm.so.2" -n 1 ../backend localhost : -x >>> LD_PRELOAD="../openmpi-1.8.4/openmpinstall/lib/libmpi.so.1 >>> ./mxm/lib/libmxm.so.2 ../libci.so" -n 1 ../app2 >>> >>> >>> -------------------------------------------------------------------------- >>> >>> WARNING: a request was made to bind a process. While the system >>> >>> supports binding the process itself, at least one node does NOT >>> >>> support binding memory to the process location. >>> >>> Node: JARVICE >>> >>> This usually is due to not having the required NUMA support installed >>> >>> on the node. In some Linux distributions, the required support is >>> >>> contained in the libnumactl and libnumactl-devel packages. >>> >>> This is a warning only; your job will continue, though performance may >>> be degraded. >>> >>> >>> -------------------------------------------------------------------------- >>> >>> i am backend >>> >>> [1429334876.139452] [JARVICE:449 :0] ib_dev.c:445 MXM WARN failed >>> call to ibv_exp_use_priv_env(): Function not implemented >>> >>> [1429334876.139464] [JARVICE:449 :0] ib_dev.c:456 MXM ERROR >>> ibv_query_device() returned 38: Function not implemented >>> >>> [1429334876.139982] [JARVICE:449 :0] ib_dev.c:445 MXM WARN failed >>> call to ibv_exp_use_priv_env(): Function not implemented >>> >>> [1429334876.139990] [JARVICE:449 :0] ib_dev.c:456 MXM ERROR >>> ibv_query_device() returned 38: Function not implemented >>> >>> [1429334876.142649] [JARVICE:450 :0] ib_dev.c:445 MXM WARN failed >>> call to ibv_exp_use_priv_env(): Function not implemented >>> >>> [1429334876.142666] [JARVICE:450 :0] ib_dev.c:456 MXM ERROR >>> ibv_query_device() returned 38: Function not implemented >>> >>> [1429334876.143235] [JARVICE:450 :0] ib_dev.c:445 MXM WARN failed >>> call to ibv_exp_use_priv_env(): Function not implemented >>> >>> [1429334876.143243] [JARVICE:450 :0] ib_dev.c:456 MXM ERROR >>> ibv_query_device() returned 38: Function not implemented >>> >>> >>> -------------------------------------------------------------------------- >>> >>> Initialization of MXM library failed. >>> >>> Error: Input/output error >>> >>> >>> -------------------------------------------------------------------------- >>> >>> [JARVICE:449 :0] Caught signal 11 (Segmentation fault) >>> >>> [JARVICE:450 :0] Caught signal 11 (Segmentation fault) >>> >>> ==== backtrace ==== >>> >>> 2 0x000000000005640c mxm_handle_error() >>> >>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:641 >>> >>> 3 0x000000000005657c mxm_error_signal_handler() >>> >>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:616 >>> >>> 4 0x00000000000329a0 killpg() ??:0 >>> >>> 5 0x000000000004812c _IO_vfprintf() ??:0 >>> >>> 6 0x000000000006f6da vasprintf() ??:0 >>> >>> 7 0x0000000000059b3b opal_show_help_vstring() ??:0 >>> >>> 8 0x0000000000026630 orte_show_help() ??:0 >>> >>> 9 0x0000000000001a3f mca_bml_r2_add_procs() >>> >>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/bml/r2/bml_r2.c:409 >>> >>> 10 0x0000000000004475 mca_pml_ob1_add_procs() >>> >>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/pml/ob1/pml_ob1.c:332 >>> >>> 11 0x00000000000442f3 ompi_mpi_init() ??:0 >>> >>> 12 0x0000000000067cb0 PMPI_Init_thread() ??:0 >>> >>> 13 0x000000000000d0ca l_getLocalFromConfig() >>> /root/rain_ib/interposer/libciutils.c:83 >>> >>> 14 0x000000000000c7b4 __cudaRegisterFatBinary() >>> /root/rain_ib/interposer/libci.c:4055 >>> >>> 15 0x0000000000402b59 >>> _ZL70__sti____cudaRegisterAll_39_tmpxft_00000703_00000000_6_app2_cpp1_ii_hwv() >>> tmpxft_00000703_00000000-3_app2.cudafe1.cpp:0 >>> >>> 16 0x0000000000402dd6 __do_global_ctors_aux() crtstuff.c:0 >>> >>> =================== >>> >>> ==== backtrace ==== >>> >>> 2 0x000000000005640c mxm_handle_error() >>> >>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:641 >>> >>> 3 0x000000000005657c mxm_error_signal_handler() >>> >>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:616 >>> >>> 4 0x00000000000329a0 killpg() ??:0 >>> >>> 5 0x000000000004812c _IO_vfprintf() ??:0 >>> >>> 6 0x000000000006f6da vasprintf() ??:0 >>> >>> 7 0x0000000000059b3b opal_show_help_vstring() ??:0 >>> >>> 8 0x0000000000026630 orte_show_help() ??:0 >>> >>> 9 0x0000000000001a3f mca_bml_r2_add_procs() >>> >>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/bml/r2/bml_r2.c:409 >>> >>> 10 0x0000000000004475 mca_pml_ob1_add_procs() >>> >>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/ompi-mellanox-v1.8/ompi/mca/pml/ob1/pml_ob1.c:332 >>> >>> 11 0x00000000000442f3 ompi_mpi_init() ??:0 >>> >>> 12 0x0000000000067cb0 PMPI_Init_thread() ??:0 >>> >>> 13 0x0000000000404fdf main() /root/rain_ib/backend/backend.c:1237 >>> >>> 14 0x000000000001ed1d __libc_start_main() ??:0 >>> >>> 15 0x0000000000402db9 _start() ??:0 >>> >>> =================== >>> >>> >>> -------------------------------------------------------------------------- >>> >>> mpirun noticed that process rank 1 with PID 450 on node JARVICE exited >>> on signal 11 (Segmentation fault). >>> >>> >>> -------------------------------------------------------------------------- >>> >>> [JARVICE:00447] 1 more process has sent help message help-mtl-mxm.txt / >>> mxm init >>> >>> [JARVICE:00447] Set MCA parameter "orte_base_help_aggregate" to 0 to see >>> all help / error messages >>> >>> [root@JARVICE hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5]# >>> >>> >>> Subhra. >>> >>> >>> On Mon, Apr 13, 2015 at 10:58 PM, Mike Dubman <mi...@dev.mellanox.co.il> >>> wrote: >>> >>>> Have you followed installation steps from README (Also here for >>>> reference http://bgate.mellanox.com/products/hpcx/README.txt) >>>> >>>> ... >>>> >>>> * Load OpenMPI/OpenSHMEM v1.8 based package: >>>> >>>> % source $HPCX_HOME/hpcx-init.sh >>>> % hpcx_load >>>> % env | grep HPCX >>>> % mpirun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_usempi >>>> % oshrun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_oshmem >>>> % hpcx_unload >>>> >>>> 3. Load HPCX environment from modules >>>> >>>> * Load OpenMPI/OpenSHMEM based package: >>>> >>>> % module use $HPCX_HOME/modulefiles >>>> % module load hpcx >>>> % mpirun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_c >>>> % oshrun -np 2 $HPCX_MPI_TESTS_DIR/examples/hello_oshmem >>>> % module unload hpcx >>>> >>>> ... >>>> >>>> On Tue, Apr 14, 2015 at 5:42 AM, Subhra Mazumdar < >>>> subhramazumd...@gmail.com> wrote: >>>> >>>>> I am using 2.4-1.0.0 mellanox ofed. >>>>> >>>>> I downloaded mofed tarball >>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5.tar and extracted >>>>> it. It has mxm directory. >>>>> >>>>> hpcx-v1.2.0-325-[root@JARVICE ~]# ls >>>>> hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5 >>>>> archive fca hpcx-init-ompi-mellanox-v1.8.sh ibprof >>>>> modulefiles ompi-mellanox-v1.8 sources VERSION >>>>> bupc-master hcoll hpcx-init.sh knem >>>>> mxm README.txt utils >>>>> >>>>> I tried using LD_PRELOAD for libmxm, but getting a different error >>>>> stack now as following >>>>> >>>>> [root@JARVICE ~]# ./openmpi-1.8.4/openmpinstall/bin/mpirun >>>>> --allow-run-as-root --mca mtl mxm -x >>>>> LD_PRELOAD="./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1 >>>>> ./hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm/lib/libmxm.so.2" >>>>> -n 1 ./backend localhost : -x >>>>> LD_PRELOAD="./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1 >>>>> ./hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm/lib/libmxm.so.2 >>>>> ./libci.so" -n 1 ./app2 >>>>> i am backend >>>>> [JARVICE:00564] mca: base: components_open: component pml / cm open >>>>> function failed >>>>> [JARVICE:564 :0] Caught signal 11 (Segmentation fault) >>>>> [JARVICE:00565] mca: base: components_open: component pml / cm open >>>>> function failed >>>>> [JARVICE:565 :0] Caught signal 11 (Segmentation fault) >>>>> ==== backtrace ==== >>>>> 2 0x000000000005640c mxm_handle_error() >>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:641 >>>>> 3 0x000000000005657c mxm_error_signal_handler() >>>>> /scrap/jenkins/workspace/hpc-power-pack/label/r-vmb-rhel6-u5-x86-64-MOFED-CHECKER/hpcx_root/src/hpcx-v1.2.0-325-gcc-MLNX_OFED_LINUX-2.4-1.0.0-redhat6.5/mxm-v3.2/src/mxm/util/debug/debug.c:616 >>>>> 4 0x00000000000329a0 killpg() ??:0 >>>>> 5 0x0000000000045491 mca_base_components_close() ??:0 >>>>> 6 0x000000000004e99a mca_base_framework_close() ??:0 >>>>> 7 0x0000000000045431 mca_base_component_close() ??:0 >>>>> 8 0x000000000004515c mca_base_framework_components_open() ??:0 >>>>> 9 0x00000000000a0de9 mca_pml_base_open() pml_base_frame.c:0 >>>>> 10 0x000000000004eb1c mca_base_framework_open() ??:0 >>>>> 11 0x0000000000043eb3 ompi_mpi_init() ??:0 >>>>> 12 0x0000000000067cb0 PMPI_Init_thread() ??:0 >>>>> 13 0x0000000000404fdf main() /root/rain_ib/backend/backend.c:1237 >>>>> 14 0x000000000001ed1d __libc_start_main() ??:0 >>>>> 15 0x0000000000402db9 _start() ??:0 >>>>> =================== >>>>> >>>>> -------------------------------------------------------------------------- >>>>> A requested component was not found, or was unable to be opened. This >>>>> means that this component is either not installed or is unable to be >>>>> used on your system (e.g., sometimes this means that shared libraries >>>>> that the component requires are unable to be found/loaded). Note that >>>>> Open MPI stopped checking at the first component that it did not find. >>>>> >>>>> Host: JARVICE >>>>> Framework: mtl >>>>> Component: mxm >>>>> >>>>> -------------------------------------------------------------------------- >>>>> >>>>> -------------------------------------------------------------------------- >>>>> mpirun noticed that process rank 0 with PID 564 on node JARVICE exited >>>>> on signal 11 (Segmentation fault). >>>>> >>>>> -------------------------------------------------------------------------- >>>>> [JARVICE:00562] 1 more process has sent help message help-mca-base.txt >>>>> / find-available:not-valid >>>>> [JARVICE:00562] Set MCA parameter "orte_base_help_aggregate" to 0 to >>>>> see all help / error messages >>>>> >>>>> >>>>> Subhra >>>>> >>>>> >>>>> On Sun, Apr 12, 2015 at 10:48 PM, Mike Dubman < >>>>> mi...@dev.mellanox.co.il> wrote: >>>>> >>>>>> seems like mxm was not found in your ld_library_path. >>>>>> >>>>>> what mofed version do you use? >>>>>> does it have /opt/mellanox/mxm in it? >>>>>> You could just run mpirun from HPCX package which looks for mxm >>>>>> internally and recompile ompi as mentioned in README. >>>>>> >>>>>> On Mon, Apr 13, 2015 at 3:24 AM, Subhra Mazumdar < >>>>>> subhramazumd...@gmail.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I used mxm mtl as follows but getting segfault. It says mxm >>>>>>> component not found but I have compiled openmpi with mxm. Any idea what >>>>>>> I >>>>>>> might be missing? >>>>>>> >>>>>>> [root@JARVICE ~]# ./openmpi-1.8.4/openmpinstall/bin/mpirun >>>>>>> --allow-run-as-root --mca pml cm --mca mtl mxm -n 1 -x >>>>>>> LD_PRELOAD=./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1 ./backend >>>>>>> localhosst : -n 1 -x LD_PRELOAD="./libci.so >>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1" ./app2 >>>>>>> i am backend >>>>>>> [JARVICE:08398] *** Process received signal *** >>>>>>> [JARVICE:08398] Signal: Segmentation fault (11) >>>>>>> [JARVICE:08398] Signal code: Address not mapped (1) >>>>>>> [JARVICE:08398] Failing at address: 0x10 >>>>>>> [JARVICE:08398] [ 0] /lib64/libpthread.so.0(+0xf710)[0x7ff8d0ddb710] >>>>>>> [JARVICE:08398] [ 1] >>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_components_close+0x21)[0x7ff8cf9ae491] >>>>>>> [JARVICE:08398] [ 2] >>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_framework_close+0x6a)[0x7ff8cf9b799a] >>>>>>> [JARVICE:08398] [ 3] >>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_component_close+0x21)[0x7ff8cf9ae431] >>>>>>> [JARVICE:08398] [ 4] >>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_framework_components_open+0x11c)[0x7ff8cf9ae15c] >>>>>>> [JARVICE:08398] [ 5] >>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1(+0xa0de9)[0x7ff8d1089de9] >>>>>>> [JARVICE:08398] [ 6] >>>>>>> /root/openmpi-1.8.4/openmpinstall/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7ff8cf9b7b1c] >>>>>>> [JARVICE:08398] [ 7] [JARVICE:08398] mca: base: components_open: >>>>>>> component pml / cm open function failed >>>>>>> >>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1(ompi_mpi_init+0x4b3)[0x7ff8d102ceb3] >>>>>>> [JARVICE:08398] [ 8] >>>>>>> ./openmpi-1.8.4/openmpinstall/lib/libmpi.so.1(PMPI_Init_thread+0x100)[0x7ff8d1050cb0] >>>>>>> [JARVICE:08398] [ 9] ./backend[0x404fdf] >>>>>>> [JARVICE:08398] [10] >>>>>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7ff8cfeded1d] >>>>>>> [JARVICE:08398] [11] ./backend[0x402db9] >>>>>>> [JARVICE:08398] *** End of error message *** >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> A requested component was not found, or was unable to be opened. >>>>>>> This >>>>>>> means that this component is either not installed or is unable to be >>>>>>> used on your system (e.g., sometimes this means that shared libraries >>>>>>> that the component requires are unable to be found/loaded). Note >>>>>>> that >>>>>>> Open MPI stopped checking at the first component that it did not >>>>>>> find. >>>>>>> >>>>>>> Host: JARVICE >>>>>>> Framework: mtl >>>>>>> Component: mxm >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> mpirun noticed that process rank 0 with PID 8398 on node JARVICE >>>>>>> exited on signal 11 (Segmentation fault). >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> >>>>>>> >>>>>>> Subhra. >>>>>>> >>>>>>> >>>>>>> On Fri, Apr 10, 2015 at 12:12 AM, Mike Dubman < >>>>>>> mi...@dev.mellanox.co.il> wrote: >>>>>>> >>>>>>>> no need IPoIB, mxm uses native IB. >>>>>>>> >>>>>>>> Please see HPCX (pre-compiled ompi, integrated with MXM and FCA) >>>>>>>> README file for details how to compile/select. >>>>>>>> >>>>>>>> The default transport is UD for internode communication and >>>>>>>> shared-memory for intra-node. >>>>>>>> >>>>>>>> http://bgate,mellanox.com/products/hpcx/ >>>>>>>> >>>>>>>> Also, mxm included in the Mellanox OFED. >>>>>>>> >>>>>>>> On Fri, Apr 10, 2015 at 5:26 AM, Subhra Mazumdar < >>>>>>>> subhramazumd...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Does ipoib need to be configured on the ib cards for mxm (I have a >>>>>>>>> separate ethernet connection too)? Also are there special flags in >>>>>>>>> mpirun >>>>>>>>> to select from UD/RC/DC? What is the default? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Subhra. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Mar 31, 2015 at 9:46 AM, Mike Dubman < >>>>>>>>> mi...@dev.mellanox.co.il> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> mxm uses IB rdma/roce technologies. Once can select UD/RC/DC >>>>>>>>>> transports to be used in mxm. >>>>>>>>>> >>>>>>>>>> By selecting mxm, all MPI p2p routines will be mapped to >>>>>>>>>> appropriate mxm functions. >>>>>>>>>> >>>>>>>>>> M >>>>>>>>>> >>>>>>>>>> On Mon, Mar 30, 2015 at 7:32 PM, Subhra Mazumdar < >>>>>>>>>> subhramazumd...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi MIke, >>>>>>>>>>> >>>>>>>>>>> Does the mxm mtl use infiniband rdma? Also from programming >>>>>>>>>>> perspective, do I need to use anything else other than >>>>>>>>>>> MPI_Send/MPI_Recv? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Subhra. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sun, Mar 29, 2015 at 11:14 PM, Mike Dubman < >>>>>>>>>>> mi...@dev.mellanox.co.il> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> openib btl does not support this thread model. >>>>>>>>>>>> You can use OMPI w/ mxm (-mca mtl mxm) and multiple thread mode >>>>>>>>>>>> lin 1.8 x series or (-mca pml yalla) in the master branch. >>>>>>>>>>>> >>>>>>>>>>>> M >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Mar 30, 2015 at 9:09 AM, Subhra Mazumdar < >>>>>>>>>>>> subhramazumd...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> Can MPI_THREAD_MULTIPLE and openib btl work together in open >>>>>>>>>>>>> mpi 1.8.4? If so are there any command line options needed during >>>>>>>>>>>>> run time? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Subhra. >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> users mailing list >>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>> Subscription: >>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>> Link to this post: >>>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26574.php >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> >>>>>>>>>>>> Kind Regards, >>>>>>>>>>>> >>>>>>>>>>>> M. >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> users mailing list >>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>> Subscription: >>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>> Link to this post: >>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26575.php >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> users mailing list >>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>> Link to this post: >>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26580.php >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> Kind Regards, >>>>>>>>>> >>>>>>>>>> M. >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> Link to this post: >>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/03/26584.php >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> Link to this post: >>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26663.php >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Kind Regards, >>>>>>>> >>>>>>>> M. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> Link to this post: >>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26665.php >>>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26686.php >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Kind Regards, >>>>>> >>>>>> M. >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26688.php >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2015/04/26711.php >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Kind Regards, >>>> >>>> M. >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/04/26712.php >>>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/04/26752.php >>> >> >> >> >> -- >> >> Kind Regards, >> >> M. >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/04/26754.php >> > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/04/26761.php > -- Kind Regards, M.