Hi, Marcin Sorry for the late response (somehow this one got lost in the clutter). We added support for shmem_ptr in the UCX SPML in Open MPI 3.0. However, in order to use it, you must install the Knem kernel module ( https://github.com/hjelmn/xpmem).
Best, Josh On Wed, Apr 18, 2018 at 4:01 AM, marcin.krotkiewski < marcin.krotkiew...@gmail.com> wrote: > Hi, > > I'm running the below example from the OpenMPI documentation: > > #include <mpp/shmem.h> > #include <stdio.h> > > main() > { > static int bigd[100]; > int *ptr; > int i; > shmem_init(); > if (shmem_my_pe() == 0) { > /* initialize PE 1’s bigd array */ > ptr = shmem_ptr(bigd, 1); > if(!ptr){ > fprintf(stderr, "get external pointer failed!\n"); > shmem_global_exit(-1); > } > for (i=0; i<100; i++) > *ptr++ = i+1; > } > shmem_barrier_all(); > if (shmem_my_pe() == 1) { > printf("bigd on PE 1 is:\n"); > for (i=0; i<100; i++) > printf(" %d\n",bigd[i]); > printf("\n"); > } > } > > but shmem_ptr always returns NULL for me. I tried with OpenMPI versions > from 2.0.1 up to 3.1.0rc4, compiled with HPCX 2.1, running on a ConnectX-4 > system. This is the command line: > > $ shmemrun -mca spml ucx -mca spml_base_verbose 100 -np 2 -map-by node > -report-bindings ./a.out > > [c11-1:36505] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: > [BB/../../../../../../../../../../../../../../..][../../../. > ./../../../../../../../../../../../..] > [c11-2:105580] MCW rank 1 bound to socket 0[core 0[hwt 0-1]]: > [BB/../../../../../../../../../../../../../../..][../../../. > ./../../../../../../../../../../../..] > [c11-1:36522] mca: base: components_register: registering framework spml > components > [c11-1:36522] mca: base: components_register: found loaded component ucx > [c11-1:36522] mca: base: components_register: component ucx register > function successful > [c11-1:36522] mca: base: components_open: opening spml components > [c11-1:36522] mca: base: components_open: found loaded component ucx > [c11-2:105590] mca: base: components_register: registering framework spml > components > [c11-2:105590] mca: base: components_register: found loaded component ucx > [c11-2:105590] mca: base: components_register: component ucx register > function successful > [c11-2:105590] mca: base: components_open: opening spml components > [c11-2:105590] mca: base: components_open: found loaded component ucx > [c11-1:36522] mca: base: components_open: component ucx open function > successful > [c11-2:105590] mca: base: components_open: component ucx open function > successful > [c11-1:36522] base/spml_base_select.c:107 - mca_spml_base_select() select: > initializing spml component ucx > [c11-1:36522] spml_ucx_component.c:173 - mca_spml_ucx_component_init() in > ucx, my priority is 21 > [c11-2:105590] base/spml_base_select.c:107 - mca_spml_base_select() > select: initializing spml component ucx > [c11-2:105590] spml_ucx_component.c:173 - mca_spml_ucx_component_init() in > ucx, my priority is 21 > [c11-1:36522] spml_ucx_component.c:184 - mca_spml_ucx_component_init() *** > ucx initialized **** > [c11-1:36522] base/spml_base_select.c:119 - mca_spml_base_select() select: > init returned priority 21 > [c11-1:36522] base/spml_base_select.c:160 - mca_spml_base_select() > selected ucx best priority 21 > [c11-1:36522] base/spml_base_select.c:194 - mca_spml_base_select() select: > component ucx selected > [c11-1:36522] spml_ucx.c:82 - mca_spml_ucx_enable() *** ucx ENABLED **** > [c11-2:105590] spml_ucx_component.c:184 - mca_spml_ucx_component_init() > *** ucx initialized **** > [c11-2:105590] base/spml_base_select.c:119 - mca_spml_base_select() > select: init returned priority 21 > [c11-2:105590] base/spml_base_select.c:160 - mca_spml_base_select() > selected ucx best priority 21 > [c11-2:105590] base/spml_base_select.c:194 - mca_spml_base_select() > select: component ucx selected > [c11-2:105590] spml_ucx.c:82 - mca_spml_ucx_enable() *** ucx ENABLED **** > [c11-1:36522] spml_ucx.c:305 - mca_spml_ucx_add_procs() *** ADDED PROCS *** > [c11-2:105590] spml_ucx.c:305 - mca_spml_ucx_add_procs() *** ADDED PROCS > *** > shared_mr flags are not supported > shared_mr flags are not supported > get external pointer failed! > > > So it looks like everything is fine - maybe except the 'shared_mr flags > are not supported' message. > > Does anyone have ideas why I get NULL? The same happens if I start two > ranks on the same compute node, and if I use shmem_malloc'ed pointer > instead of a static array. > > Thank you, > > Marcin > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users