Jeff, Gilles,
Thanks for your input. I am aware of the limitations of Sys5 shmem (the
links you posted do not accurately reflect the description of SHMMAX,
SHMALL, and SHMMNI found in the standard, though. See
http://man7.org/linux/man-pages/man2/shmget.2.html).
However, these limitations ca
In my experience, POSIX is much more reliable than Sys5. Sys5 depends on
the value of shmmax, which is often set to a small fraction of node
memory. I've probably seen the error described on
http://verahill.blogspot.com/2012/04/solution-to-nwchem-shmmax-too-small.html
with NWChem a 1000 times bec
Joseph,
Thanks for sharing this !
sysv is imho the worst option because if something goes really wrong, Open MPI
might leave some shared memory segments behind when a job crashes. From that
perspective, leaving a big file in /tmp can be seen as the lesser evil.
That being said, there might be o
We are currently discussing internally how to proceed with this issue on
our machine. We did a little survey to see the setup of some of the
machines we have access to, which includes an IBM, a Bull machine, and
two Cray XC40 machines. To summarize our findings:
1) On the Cray systems, both /t
On Mon, Sep 4, 2017 at 6:13 AM, Joseph Schuchart wrote:
> Jeff, all,
>
> Unfortunately, I (as a user) have no control over the page size on our
> cluster. My interest in this is more of a general nature because I am
> concerned that our users who use Open MPI underneath our code run into this
> i
Gilles,
On 09/04/2017 03:22 PM, Gilles Gouaillardet wrote:
Joseph,
please open a github issue regarding the SIGBUS error.
Done: https://github.com/open-mpi/ompi/issues/4166
as far as i understand, MAP_ANONYMOUS+MAP_SHARED can only be used
between related processes. (e.g. parent and childre
Joseph,
please open a github issue regarding the SIGBUS error.
as far as i understand, MAP_ANONYMOUS+MAP_SHARED can only be used
between related processes. (e.g. parent and children)
in the case of Open MPI, MPI tasks are siblings, so this is not an option.
Cheers,
Gilles
On Mon, Sep 4, 2017
Jeff, all,
Unfortunately, I (as a user) have no control over the page size on our
cluster. My interest in this is more of a general nature because I am
concerned that our users who use Open MPI underneath our code run into
this issue on their machine.
I took a look at the code for the variou
I don't know any reason why you shouldn't be able to use IB for intra-node
transfers. There are, of course, arguments against doing it in general
(e.g. IB/PCI bandwidth less than DDR4 bandwidth), but it likely behaves
less synchronously than shared-memory, since I'm not aware of any MPI RMA
librar
Jeff, all,
Thanks for the clarification. My measurements show that global memory
allocations do not require the backing file if there is only one process
per node, for arbitrary number of processes. So I was wondering if it
was possible to use the same allocation process even with multiple
pr
There's no reason to do anything special for shared memory with a
single-process job because MPI_Win_allocate_shared(MPI_COMM_SELF) ~=
MPI_Alloc_mem(). However, it would help debugging if MPI implementers at
least had an option to take the code path that allocates shared memory even
when np=1.
Je
Gilles,
Thanks for your swift response. On this system, /dev/shm only has 256M
available so that is no option unfortunately. I tried disabling both
vader and sm btl via `--mca btl ^vader,sm` but Open MPI still seems to
allocate the shmem backing file under /tmp. From my point of view,
missing
Joseph,
the error message suggests that allocating memory with
MPI_Win_allocate[_shared] is done by creating a file and then mmap'ing
it.
how much space do you have in /dev/shm ? (this is a tmpfs e.g. a RAM
file system)
there is likely quite some space here, so as a workaround, i suggest
you use t
All,
I have been experimenting with large window allocations recently and
have made some interesting observations that I would like to share.
The system under test:
- Linux cluster equipped with IB,
- Open MPI 2.1.1,
- 128GB main memory per node
- 6GB /tmp filesystem per node
My obser
14 matches
Mail list logo