**xpmem kernel module.

On Fri, Jun 1, 2018 at 3:16 PM, Joshua Ladd <jladd.m...@gmail.com> wrote:

> Hi, Marcin
>
> Sorry for the late response (somehow this one got lost in the clutter). We
> added support for shmem_ptr in the UCX SPML in Open MPI 3.0. However, in
> order to use it, you must install the Knem kernel module (
> https://github.com/hjelmn/xpmem).
>
> Best,
>
> Josh
>
> On Wed, Apr 18, 2018 at 4:01 AM, marcin.krotkiewski <
> marcin.krotkiew...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm running the below example from the OpenMPI documentation:
>>
>> #include <mpp/shmem.h>
>> #include <stdio.h>
>>
>> main()
>> {
>>   static int bigd[100];
>>   int *ptr;
>>   int i;
>>   shmem_init();
>>   if (shmem_my_pe() == 0) {
>>     /* initialize PE 1’s bigd array */
>>     ptr = shmem_ptr(bigd, 1);
>>     if(!ptr){
>>       fprintf(stderr, "get external pointer failed!\n");
>>       shmem_global_exit(-1);
>>     }
>>     for (i=0; i<100; i++)
>>       *ptr++ = i+1;
>>   }
>>   shmem_barrier_all();
>>   if (shmem_my_pe() == 1) {
>>     printf("bigd on PE 1 is:\n");
>>     for (i=0; i<100; i++)
>>       printf(" %d\n",bigd[i]);
>>     printf("\n");
>>   }
>> }
>>
>> but shmem_ptr always returns NULL for me. I tried with OpenMPI versions
>> from 2.0.1 up to 3.1.0rc4, compiled with HPCX 2.1, running on a ConnectX-4
>> system. This is the command line:
>>
>> $ shmemrun -mca spml ucx -mca spml_base_verbose 100 -np 2 -map-by node
>> -report-bindings ./a.out
>>
>> [c11-1:36505] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]:
>> [BB/../../../../../../../../../../../../../../..][../../../.
>> ./../../../../../../../../../../../..]
>> [c11-2:105580] MCW rank 1 bound to socket 0[core 0[hwt 0-1]]:
>> [BB/../../../../../../../../../../../../../../..][../../../.
>> ./../../../../../../../../../../../..]
>> [c11-1:36522] mca: base: components_register: registering framework spml
>> components
>> [c11-1:36522] mca: base: components_register: found loaded component ucx
>> [c11-1:36522] mca: base: components_register: component ucx register
>> function successful
>> [c11-1:36522] mca: base: components_open: opening spml components
>> [c11-1:36522] mca: base: components_open: found loaded component ucx
>> [c11-2:105590] mca: base: components_register: registering framework spml
>> components
>> [c11-2:105590] mca: base: components_register: found loaded component ucx
>> [c11-2:105590] mca: base: components_register: component ucx register
>> function successful
>> [c11-2:105590] mca: base: components_open: opening spml components
>> [c11-2:105590] mca: base: components_open: found loaded component ucx
>> [c11-1:36522] mca: base: components_open: component ucx open function
>> successful
>> [c11-2:105590] mca: base: components_open: component ucx open function
>> successful
>> [c11-1:36522] base/spml_base_select.c:107 - mca_spml_base_select()
>> select: initializing spml component ucx
>> [c11-1:36522] spml_ucx_component.c:173 - mca_spml_ucx_component_init() in
>> ucx, my priority is 21
>> [c11-2:105590] base/spml_base_select.c:107 - mca_spml_base_select()
>> select: initializing spml component ucx
>> [c11-2:105590] spml_ucx_component.c:173 - mca_spml_ucx_component_init()
>> in ucx, my priority is 21
>> [c11-1:36522] spml_ucx_component.c:184 - mca_spml_ucx_component_init()
>> *** ucx initialized ****
>> [c11-1:36522] base/spml_base_select.c:119 - mca_spml_base_select()
>> select: init returned priority 21
>> [c11-1:36522] base/spml_base_select.c:160 - mca_spml_base_select()
>> selected ucx best priority 21
>> [c11-1:36522] base/spml_base_select.c:194 - mca_spml_base_select()
>> select: component ucx selected
>> [c11-1:36522] spml_ucx.c:82 - mca_spml_ucx_enable() *** ucx ENABLED ****
>> [c11-2:105590] spml_ucx_component.c:184 - mca_spml_ucx_component_init()
>> *** ucx initialized ****
>> [c11-2:105590] base/spml_base_select.c:119 - mca_spml_base_select()
>> select: init returned priority 21
>> [c11-2:105590] base/spml_base_select.c:160 - mca_spml_base_select()
>> selected ucx best priority 21
>> [c11-2:105590] base/spml_base_select.c:194 - mca_spml_base_select()
>> select: component ucx selected
>> [c11-2:105590] spml_ucx.c:82 - mca_spml_ucx_enable() *** ucx ENABLED ****
>> [c11-1:36522] spml_ucx.c:305 - mca_spml_ucx_add_procs() *** ADDED PROCS
>> ***
>> [c11-2:105590] spml_ucx.c:305 - mca_spml_ucx_add_procs() *** ADDED PROCS
>> ***
>> shared_mr flags are not supported
>> shared_mr flags are not supported
>> get external pointer failed!
>>
>>
>> So it looks like everything is fine - maybe except the 'shared_mr flags
>> are not supported' message.
>>
>> Does anyone have ideas why I get NULL? The same happens if I start two
>> ranks on the same compute node, and if I use shmem_malloc'ed pointer
>> instead of a static array.
>>
>> Thank you,
>>
>> Marcin
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>
>
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to