Re: [OMPI users] Issues with Large Window Allocations

2017-09-09 Thread Joseph Schuchart
Jeff, Gilles, Thanks for your input. I am aware of the limitations of Sys5 shmem (the links you posted do not accurately reflect the description of SHMMAX, SHMALL, and SHMMNI found in the standard, though. See http://man7.org/linux/man-pages/man2/shmget.2.html). However, these limitations ca

Re: [OMPI users] Issues with Large Window Allocations

2017-09-08 Thread Jeff Hammond
In my experience, POSIX is much more reliable than Sys5. Sys5 depends on the value of shmmax, which is often set to a small fraction of node memory. I've probably seen the error described on http://verahill.blogspot.com/2012/04/solution-to-nwchem-shmmax-too-small.html with NWChem a 1000 times bec

Re: [OMPI users] Issues with Large Window Allocations

2017-09-08 Thread Gilles Gouaillardet
Joseph, Thanks for sharing this ! sysv is imho the worst option because if something goes really wrong, Open MPI might leave some shared memory segments behind when a job crashes. From that perspective, leaving a big file in /tmp can be seen as the lesser evil. That being said, there might be o

Re: [OMPI users] Issues with Large Window Allocations

2017-09-08 Thread Joseph Schuchart
We are currently discussing internally how to proceed with this issue on our machine. We did a little survey to see the setup of some of the machines we have access to, which includes an IBM, a Bull machine, and two Cray XC40 machines. To summarize our findings: 1) On the Cray systems, both /t

Re: [OMPI users] Issues with Large Window Allocations

2017-09-04 Thread Jeff Hammond
On Mon, Sep 4, 2017 at 6:13 AM, Joseph Schuchart wrote: > Jeff, all, > > Unfortunately, I (as a user) have no control over the page size on our > cluster. My interest in this is more of a general nature because I am > concerned that our users who use Open MPI underneath our code run into this > i

Re: [OMPI users] Issues with Large Window Allocations

2017-09-04 Thread Joseph Schuchart
Gilles, On 09/04/2017 03:22 PM, Gilles Gouaillardet wrote: Joseph, please open a github issue regarding the SIGBUS error. Done: https://github.com/open-mpi/ompi/issues/4166 as far as i understand, MAP_ANONYMOUS+MAP_SHARED can only be used between related processes. (e.g. parent and childre

Re: [OMPI users] Issues with Large Window Allocations

2017-09-04 Thread Gilles Gouaillardet
Joseph, please open a github issue regarding the SIGBUS error. as far as i understand, MAP_ANONYMOUS+MAP_SHARED can only be used between related processes. (e.g. parent and children) in the case of Open MPI, MPI tasks are siblings, so this is not an option. Cheers, Gilles On Mon, Sep 4, 2017

Re: [OMPI users] Issues with Large Window Allocations

2017-09-04 Thread Joseph Schuchart
Jeff, all, Unfortunately, I (as a user) have no control over the page size on our cluster. My interest in this is more of a general nature because I am concerned that our users who use Open MPI underneath our code run into this issue on their machine. I took a look at the code for the variou

Re: [OMPI users] Issues with Large Window Allocations

2017-08-29 Thread Jeff Hammond
I don't know any reason why you shouldn't be able to use IB for intra-node transfers. There are, of course, arguments against doing it in general (e.g. IB/PCI bandwidth less than DDR4 bandwidth), but it likely behaves less synchronously than shared-memory, since I'm not aware of any MPI RMA librar

Re: [OMPI users] Issues with Large Window Allocations

2017-08-29 Thread Joseph Schuchart
Jeff, all, Thanks for the clarification. My measurements show that global memory allocations do not require the backing file if there is only one process per node, for arbitrary number of processes. So I was wondering if it was possible to use the same allocation process even with multiple pr

Re: [OMPI users] Issues with Large Window Allocations

2017-08-25 Thread Jeff Hammond
There's no reason to do anything special for shared memory with a single-process job because MPI_Win_allocate_shared(MPI_COMM_SELF) ~= MPI_Alloc_mem(). However, it would help debugging if MPI implementers at least had an option to take the code path that allocates shared memory even when np=1. Je

Re: [OMPI users] Issues with Large Window Allocations

2017-08-24 Thread Joseph Schuchart
Gilles, Thanks for your swift response. On this system, /dev/shm only has 256M available so that is no option unfortunately. I tried disabling both vader and sm btl via `--mca btl ^vader,sm` but Open MPI still seems to allocate the shmem backing file under /tmp. From my point of view, missing

Re: [OMPI users] Issues with Large Window Allocations

2017-08-24 Thread Gilles Gouaillardet
Joseph, the error message suggests that allocating memory with MPI_Win_allocate[_shared] is done by creating a file and then mmap'ing it. how much space do you have in /dev/shm ? (this is a tmpfs e.g. a RAM file system) there is likely quite some space here, so as a workaround, i suggest you use t

[OMPI users] Issues with Large Window Allocations

2017-08-24 Thread Joseph Schuchart
All, I have been experimenting with large window allocations recently and have made some interesting observations that I would like to share. The system under test: - Linux cluster equipped with IB, - Open MPI 2.1.1, - 128GB main memory per node - 6GB /tmp filesystem per node My obser