Why do you need kernel support for interprocess shared memory? Just allocate the symmetric heap as shared-memory. Sure, this does not support other symmetric variables, but shmem_ptr can detect that and return NULL for those cases.
shmem_ptr should behave similar to MPI_Win_shared_query... Jeff On Fri, Jun 1, 2018 at 12:19 PM, Joshua Ladd <jladd.m...@gmail.com> wrote: > **xpmem kernel module. > > On Fri, Jun 1, 2018 at 3:16 PM, Joshua Ladd <jladd.m...@gmail.com> wrote: > >> Hi, Marcin >> >> Sorry for the late response (somehow this one got lost in the clutter). >> We added support for shmem_ptr in the UCX SPML in Open MPI 3.0. However, in >> order to use it, you must install the Knem kernel module ( >> https://github.com/hjelmn/xpmem). >> >> Best, >> >> Josh >> >> On Wed, Apr 18, 2018 at 4:01 AM, marcin.krotkiewski < >> marcin.krotkiew...@gmail.com> wrote: >> >>> Hi, >>> >>> I'm running the below example from the OpenMPI documentation: >>> >>> #include <mpp/shmem.h> >>> #include <stdio.h> >>> >>> main() >>> { >>> static int bigd[100]; >>> int *ptr; >>> int i; >>> shmem_init(); >>> if (shmem_my_pe() == 0) { >>> /* initialize PE 1’s bigd array */ >>> ptr = shmem_ptr(bigd, 1); >>> if(!ptr){ >>> fprintf(stderr, "get external pointer failed!\n"); >>> shmem_global_exit(-1); >>> } >>> for (i=0; i<100; i++) >>> *ptr++ = i+1; >>> } >>> shmem_barrier_all(); >>> if (shmem_my_pe() == 1) { >>> printf("bigd on PE 1 is:\n"); >>> for (i=0; i<100; i++) >>> printf(" %d\n",bigd[i]); >>> printf("\n"); >>> } >>> } >>> >>> but shmem_ptr always returns NULL for me. I tried with OpenMPI versions >>> from 2.0.1 up to 3.1.0rc4, compiled with HPCX 2.1, running on a ConnectX-4 >>> system. This is the command line: >>> >>> $ shmemrun -mca spml ucx -mca spml_base_verbose 100 -np 2 -map-by node >>> -report-bindings ./a.out >>> >>> [c11-1:36505] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: >>> [BB/../../../../../../../../../../../../../../..][../../../. >>> ./../../../../../../../../../../../..] >>> [c11-2:105580] MCW rank 1 bound to socket 0[core 0[hwt 0-1]]: >>> [BB/../../../../../../../../../../../../../../..][../../../. >>> ./../../../../../../../../../../../..] >>> [c11-1:36522] mca: base: components_register: registering framework spml >>> components >>> [c11-1:36522] mca: base: components_register: found loaded component ucx >>> [c11-1:36522] mca: base: components_register: component ucx register >>> function successful >>> [c11-1:36522] mca: base: components_open: opening spml components >>> [c11-1:36522] mca: base: components_open: found loaded component ucx >>> [c11-2:105590] mca: base: components_register: registering framework >>> spml components >>> [c11-2:105590] mca: base: components_register: found loaded component ucx >>> [c11-2:105590] mca: base: components_register: component ucx register >>> function successful >>> [c11-2:105590] mca: base: components_open: opening spml components >>> [c11-2:105590] mca: base: components_open: found loaded component ucx >>> [c11-1:36522] mca: base: components_open: component ucx open function >>> successful >>> [c11-2:105590] mca: base: components_open: component ucx open function >>> successful >>> [c11-1:36522] base/spml_base_select.c:107 - mca_spml_base_select() >>> select: initializing spml component ucx >>> [c11-1:36522] spml_ucx_component.c:173 - mca_spml_ucx_component_init() >>> in ucx, my priority is 21 >>> [c11-2:105590] base/spml_base_select.c:107 - mca_spml_base_select() >>> select: initializing spml component ucx >>> [c11-2:105590] spml_ucx_component.c:173 - mca_spml_ucx_component_init() >>> in ucx, my priority is 21 >>> [c11-1:36522] spml_ucx_component.c:184 - mca_spml_ucx_component_init() >>> *** ucx initialized **** >>> [c11-1:36522] base/spml_base_select.c:119 - mca_spml_base_select() >>> select: init returned priority 21 >>> [c11-1:36522] base/spml_base_select.c:160 - mca_spml_base_select() >>> selected ucx best priority 21 >>> [c11-1:36522] base/spml_base_select.c:194 - mca_spml_base_select() >>> select: component ucx selected >>> [c11-1:36522] spml_ucx.c:82 - mca_spml_ucx_enable() *** ucx ENABLED **** >>> [c11-2:105590] spml_ucx_component.c:184 - mca_spml_ucx_component_init() >>> *** ucx initialized **** >>> [c11-2:105590] base/spml_base_select.c:119 - mca_spml_base_select() >>> select: init returned priority 21 >>> [c11-2:105590] base/spml_base_select.c:160 - mca_spml_base_select() >>> selected ucx best priority 21 >>> [c11-2:105590] base/spml_base_select.c:194 - mca_spml_base_select() >>> select: component ucx selected >>> [c11-2:105590] spml_ucx.c:82 - mca_spml_ucx_enable() *** ucx ENABLED **** >>> [c11-1:36522] spml_ucx.c:305 - mca_spml_ucx_add_procs() *** ADDED PROCS >>> *** >>> [c11-2:105590] spml_ucx.c:305 - mca_spml_ucx_add_procs() *** ADDED PROCS >>> *** >>> shared_mr flags are not supported >>> shared_mr flags are not supported >>> get external pointer failed! >>> >>> >>> So it looks like everything is fine - maybe except the 'shared_mr flags >>> are not supported' message. >>> >>> Does anyone have ideas why I get NULL? The same happens if I start two >>> ranks on the same compute node, and if I use shmem_malloc'ed pointer >>> instead of a static array. >>> >>> Thank you, >>> >>> Marcin >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://lists.open-mpi.org/mailman/listinfo/users >> >> >> > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users > -- Jeff Hammond jeff.scie...@gmail.com http://jeffhammond.github.io/
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users