Why do you need kernel support for interprocess shared memory?  Just
allocate the symmetric heap as shared-memory.  Sure, this does not support
other symmetric variables, but shmem_ptr can detect that and return NULL
for those cases.

shmem_ptr should behave similar to MPI_Win_shared_query...

Jeff

On Fri, Jun 1, 2018 at 12:19 PM, Joshua Ladd <jladd.m...@gmail.com> wrote:

> **xpmem kernel module.
>
> On Fri, Jun 1, 2018 at 3:16 PM, Joshua Ladd <jladd.m...@gmail.com> wrote:
>
>> Hi, Marcin
>>
>> Sorry for the late response (somehow this one got lost in the clutter).
>> We added support for shmem_ptr in the UCX SPML in Open MPI 3.0. However, in
>> order to use it, you must install the Knem kernel module (
>> https://github.com/hjelmn/xpmem).
>>
>> Best,
>>
>> Josh
>>
>> On Wed, Apr 18, 2018 at 4:01 AM, marcin.krotkiewski <
>> marcin.krotkiew...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I'm running the below example from the OpenMPI documentation:
>>>
>>> #include <mpp/shmem.h>
>>> #include <stdio.h>
>>>
>>> main()
>>> {
>>>   static int bigd[100];
>>>   int *ptr;
>>>   int i;
>>>   shmem_init();
>>>   if (shmem_my_pe() == 0) {
>>>     /* initialize PE 1’s bigd array */
>>>     ptr = shmem_ptr(bigd, 1);
>>>     if(!ptr){
>>>       fprintf(stderr, "get external pointer failed!\n");
>>>       shmem_global_exit(-1);
>>>     }
>>>     for (i=0; i<100; i++)
>>>       *ptr++ = i+1;
>>>   }
>>>   shmem_barrier_all();
>>>   if (shmem_my_pe() == 1) {
>>>     printf("bigd on PE 1 is:\n");
>>>     for (i=0; i<100; i++)
>>>       printf(" %d\n",bigd[i]);
>>>     printf("\n");
>>>   }
>>> }
>>>
>>> but shmem_ptr always returns NULL for me. I tried with OpenMPI versions
>>> from 2.0.1 up to 3.1.0rc4, compiled with HPCX 2.1, running on a ConnectX-4
>>> system. This is the command line:
>>>
>>> $ shmemrun -mca spml ucx -mca spml_base_verbose 100 -np 2 -map-by node
>>> -report-bindings ./a.out
>>>
>>> [c11-1:36505] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]:
>>> [BB/../../../../../../../../../../../../../../..][../../../.
>>> ./../../../../../../../../../../../..]
>>> [c11-2:105580] MCW rank 1 bound to socket 0[core 0[hwt 0-1]]:
>>> [BB/../../../../../../../../../../../../../../..][../../../.
>>> ./../../../../../../../../../../../..]
>>> [c11-1:36522] mca: base: components_register: registering framework spml
>>> components
>>> [c11-1:36522] mca: base: components_register: found loaded component ucx
>>> [c11-1:36522] mca: base: components_register: component ucx register
>>> function successful
>>> [c11-1:36522] mca: base: components_open: opening spml components
>>> [c11-1:36522] mca: base: components_open: found loaded component ucx
>>> [c11-2:105590] mca: base: components_register: registering framework
>>> spml components
>>> [c11-2:105590] mca: base: components_register: found loaded component ucx
>>> [c11-2:105590] mca: base: components_register: component ucx register
>>> function successful
>>> [c11-2:105590] mca: base: components_open: opening spml components
>>> [c11-2:105590] mca: base: components_open: found loaded component ucx
>>> [c11-1:36522] mca: base: components_open: component ucx open function
>>> successful
>>> [c11-2:105590] mca: base: components_open: component ucx open function
>>> successful
>>> [c11-1:36522] base/spml_base_select.c:107 - mca_spml_base_select()
>>> select: initializing spml component ucx
>>> [c11-1:36522] spml_ucx_component.c:173 - mca_spml_ucx_component_init()
>>> in ucx, my priority is 21
>>> [c11-2:105590] base/spml_base_select.c:107 - mca_spml_base_select()
>>> select: initializing spml component ucx
>>> [c11-2:105590] spml_ucx_component.c:173 - mca_spml_ucx_component_init()
>>> in ucx, my priority is 21
>>> [c11-1:36522] spml_ucx_component.c:184 - mca_spml_ucx_component_init()
>>> *** ucx initialized ****
>>> [c11-1:36522] base/spml_base_select.c:119 - mca_spml_base_select()
>>> select: init returned priority 21
>>> [c11-1:36522] base/spml_base_select.c:160 - mca_spml_base_select()
>>> selected ucx best priority 21
>>> [c11-1:36522] base/spml_base_select.c:194 - mca_spml_base_select()
>>> select: component ucx selected
>>> [c11-1:36522] spml_ucx.c:82 - mca_spml_ucx_enable() *** ucx ENABLED ****
>>> [c11-2:105590] spml_ucx_component.c:184 - mca_spml_ucx_component_init()
>>> *** ucx initialized ****
>>> [c11-2:105590] base/spml_base_select.c:119 - mca_spml_base_select()
>>> select: init returned priority 21
>>> [c11-2:105590] base/spml_base_select.c:160 - mca_spml_base_select()
>>> selected ucx best priority 21
>>> [c11-2:105590] base/spml_base_select.c:194 - mca_spml_base_select()
>>> select: component ucx selected
>>> [c11-2:105590] spml_ucx.c:82 - mca_spml_ucx_enable() *** ucx ENABLED ****
>>> [c11-1:36522] spml_ucx.c:305 - mca_spml_ucx_add_procs() *** ADDED PROCS
>>> ***
>>> [c11-2:105590] spml_ucx.c:305 - mca_spml_ucx_add_procs() *** ADDED PROCS
>>> ***
>>> shared_mr flags are not supported
>>> shared_mr flags are not supported
>>> get external pointer failed!
>>>
>>>
>>> So it looks like everything is fine - maybe except the 'shared_mr flags
>>> are not supported' message.
>>>
>>> Does anyone have ideas why I get NULL? The same happens if I start two
>>> ranks on the same compute node, and if I use shmem_malloc'ed pointer
>>> instead of a static array.
>>>
>>> Thank you,
>>>
>>> Marcin
>>>
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>>
>>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>



-- 
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to