I really need to update that wording. It has been awhile and the code seems to have stabilized. It’s quite safe to use and supports some of the latest kernel versions.
-Nathan > On Nov 13, 2018, at 11:06 PM, Bert Wesarg via users > <users@lists.open-mpi.org> wrote: > > Dear Takahiro, > On Wed, Nov 14, 2018 at 5:38 AM Kawashima, Takahiro > <t-kawash...@jp.fujitsu.com> wrote: >> >> XPMEM moved to GitLab. >> >> https://gitlab.com/hjelmn/xpmem > > the first words from the README aren't very pleasant to read: > > This is an experimental version of XPMEM based on a version provided by > Cray and uploaded to https://code.google.com/p/xpmem. This version supports > any kernel 3.12 and newer. *Keep in mind there may be bugs and this version > may cause kernel panics, code crashes, eat your cat, etc.* > > Installing this on my laptop where I just want developing with SHMEM > it would be a pitty to lose work just because of that. > > Best, > Bert > >> >> Thanks, >> Takahiro Kawashima, >> Fujitsu >> >>> Hello Bert, >>> >>> What OS are you running on your notebook? >>> >>> If you are running Linux, and you have root access to your system, then >>> you should be able to resolve the Open SHMEM support issue by installing >>> the XPMEM device driver on your system, and rebuilding UCX so it picks >>> up XPMEM support. >>> >>> The source code is on GitHub: >>> >>> https://github.com/hjelmn/xpmem >>> >>> Some instructions on how to build the xpmem device driver are at >>> >>> https://github.com/hjelmn/xpmem/wiki/Installing-XPMEM >>> >>> You will need to install the kernel source and symbols rpms on your >>> system before building the xpmem device driver. >>> >>> Hope this helps, >>> >>> Howard >>> >>> >>> Am Di., 13. Nov. 2018 um 15:00 Uhr schrieb Bert Wesarg via users < >>> users@lists.open-mpi.org>: >>> >>>> Hi, >>>> >>>> On Mon, Nov 12, 2018 at 10:49 PM Pritchard Jr., Howard via announce >>>> <annou...@lists.open-mpi.org> wrote: >>>>> >>>>> The Open MPI Team, representing a consortium of research, academic, and >>>>> industry partners, is pleased to announce the release of Open MPI version >>>>> 4.0.0. >>>>> >>>>> v4.0.0 is the start of a new release series for Open MPI. Starting with >>>>> this release, the OpenIB BTL supports only iWarp and RoCE by default. >>>>> Starting with this release, UCX is the preferred transport protocol >>>>> for Infiniband interconnects. The embedded PMIx runtime has been updated >>>>> to 3.0.2. The embedded Romio has been updated to 3.2.1. This >>>>> release is ABI compatible with the 3.x release streams. There have been >>>> numerous >>>>> other bug fixes and performance improvements. >>>>> >>>>> Note that starting with Open MPI v4.0.0, prototypes for several >>>>> MPI-1 symbols that were deleted in the MPI-3.0 specification >>>>> (which was published in 2012) are no longer available by default in >>>>> mpi.h. See the README for further details. >>>>> >>>>> Version 4.0.0 can be downloaded from the main Open MPI web site: >>>>> >>>>> https://www.open-mpi.org/software/ompi/v4.0/ >>>>> >>>>> >>>>> 4.0.0 -- September, 2018 >>>>> ------------------------ >>>>> >>>>> - OSHMEM updated to the OpenSHMEM 1.4 API. >>>>> - Do not build OpenSHMEM layer when there are no SPMLs available. >>>>> Currently, this means the OpenSHMEM layer will only build if >>>>> a MXM or UCX library is found. >>>> >>>> so what is the most convenience way to get SHMEM working on a single >>>> shared memory node (aka. notebook)? I just realized that I don't have >>>> a SHMEM since Open MPI 3.0. But building with UCX does not help >>>> either. I tried with UCX 1.4 but Open MPI SHMEM >>>> still does not work: >>>> >>>> $ oshcc -o shmem_hello_world-4.0.0 openmpi-4.0.0/examples/hello_oshmem_c.c >>>> $ oshrun -np 2 ./shmem_hello_world-4.0.0 >>>> [1542109710.217344] [tudtug:27715:0] select.c:406 UCX ERROR >>>> no remote registered memory access transport to tudtug:27716: >>>> self/self - Destination is unreachable, tcp/enp0s31f6 - no put short, >>>> tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable, >>>> mm/posix - Destination is unreachable, cma/cma - no put short >>>> [1542109710.217344] [tudtug:27716:0] select.c:406 UCX ERROR >>>> no remote registered memory access transport to tudtug:27715: >>>> self/self - Destination is unreachable, tcp/enp0s31f6 - no put short, >>>> tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable, >>>> mm/posix - Destination is unreachable, cma/cma - no put short >>>> [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266 >>>> Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable >>>> [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305 >>>> Error: add procs FAILED rc=-2 >>>> [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266 >>>> Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable >>>> [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305 >>>> Error: add procs FAILED rc=-2 >>>> -------------------------------------------------------------------------- >>>> It looks like SHMEM_INIT failed for some reason; your parallel process is >>>> likely to abort. There are many reasons that a parallel process can >>>> fail during SHMEM_INIT; some of which are due to configuration or >>>> environment >>>> problems. This failure appears to be an internal failure; here's some >>>> additional information (which may only be relevant to an Open SHMEM >>>> developer): >>>> >>>> SPML add procs failed >>>> --> Returned "Out of resource" (-2) instead of "Success" (0) >>>> -------------------------------------------------------------------------- >>>> [tudtug:27715] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to >>>> initialize - aborting >>>> [tudtug:27716] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to >>>> initialize - aborting >>>> -------------------------------------------------------------------------- >>>> SHMEM_ABORT was invoked on rank 0 (pid 27715, host=tudtug) with errorcode >>>> -1. >>>> -------------------------------------------------------------------------- >>>> -------------------------------------------------------------------------- >>>> A SHMEM process is aborting at a time when it cannot guarantee that all >>>> of its peer processes in the job will be killed properly. You should >>>> double check that everything has shut down cleanly. >>>> >>>> Local host: tudtug >>>> PID: 27715 >>>> -------------------------------------------------------------------------- >>>> -------------------------------------------------------------------------- >>>> Primary job terminated normally, but 1 process returned >>>> a non-zero exit code. Per user-direction, the job has been aborted. >>>> -------------------------------------------------------------------------- >>>> -------------------------------------------------------------------------- >>>> oshrun detected that one or more processes exited with non-zero >>>> status, thus causing >>>> the job to be terminated. The first process to do so was: >>>> >>>> Process name: [[2212,1],1] >>>> Exit code: 255 >>>> -------------------------------------------------------------------------- >>>> [tudtug:27710] 1 more process has sent help message >>>> help-shmem-runtime.txt / shmem_init:startup:internal-failure >>>> [tudtug:27710] Set MCA parameter "orte_base_help_aggregate" to 0 to >>>> see all help / error messages >>>> [tudtug:27710] 1 more process has sent help message help-shmem-api.txt >>>> / shmem-abort >>>> [tudtug:27710] 1 more process has sent help message >>>> help-shmem-runtime.txt / oshmem shmem abort:cannot guarantee all >>>> killed >>>> >>>> MPI works as expected: >>>> >>>> $ mpicc -o mpi_hello_world-4.0.0 openmpi-4.0.0/examples/hello_c.c >>>> $ mpirun -np 2 ./mpi_hello_world-4.0.0 >>>> Hello, world, I am 0 of 2, (Open MPI v4.0.0, package: Open MPI >>>> wesarg@tudtug Distribution, ident: 4.0.0, repo rev: v4.0.0, Nov 12, >>>> 2018, 108) >>>> Hello, world, I am 1 of 2, (Open MPI v4.0.0, package: Open MPI >>>> wesarg@tudtug Distribution, ident: 4.0.0, repo rev: v4.0.0, Nov 12, >>>> 2018, 108) >>>> >>>> I'm attaching the output from 'ompi_info -a' and also from 'ucx_info >>>> -b -d -c -s'. >>>> >>>> Thanks for the help. >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users