Dear developers of OPENMPI,
There remains a hanging observed in MPI_WIN_ALLOCATE_SHARED.
But first:
Thank you for your advices to employ shmem_mmap_relocate_backing_file = 1
It indeed turned out, that the bad (but silent) allocations by
MPI_WIN_ALLOCATE_SHARED, which I observed in the past after ~140 MB of
allocated shared memory,
were indeed caused by a too small available storage for the sharedmem backing
files. Applying the MCA parameter resolved the problem.
Now the allocation of shared data windows by MPI_WIN_ALLOCATE_SHARED in the
OPENMPI-1.8.3 release version works on both clusters!
I tested it both with my small sharedmem-Ftn-testprogram as well as with our
Ftn-CFD-code.
It worked even when allocating 1000 shared data windows containing a total of
40 GB. Very well.
But now I come to the problem remaining:
According to the attached email of Jeff (see below) of 2014-10-24,
we have alternatively installed and tested the bugfixed OPENMPI Nightly Tarball
of 2014-10-24 (openmpi-dev-176-g9334abc.tar.gz) on Cluster5 .
That version worked well, when our CFD-code was running on only 1 node.
But I observe now, that when running the CFD-code on 2 node with 2 processes
per node,
after having allocated a total of 200 MB of data in 20 shared windows, the
allocation of the 21-th window fails,
because all 4 processes enter MPI_WIN_ALLOCATE_SHARED but never leave it. The
code hangs in that routine, without any message.
In contrast, that bug does NOT occur with the OPENMPI-1.8.3 release version
with same program on same machine.
That means for you:
In openmpi-dev-176-g9334abc.tar.gz the new-introduced bugfix concerning
the shared memory allocation may be not yet correctly coded ,
or that version contains another new bug in sharedmemory allocation
compared to the working(!) 1.8.3-release version.
Greetings to you all
Michael Rachner
-----Ursprüngliche Nachricht-----
Von: users [mailto:[email protected]] Im Auftrag von Jeff Squyres
(jsquyres)
Gesendet: Freitag, 24. Oktober 2014 22:45
An: Open MPI User's List
Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared
memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code
Nathan tells me that this may well be related to a fix that was literally just
pulled into the v1.8 branch today:
https://github.com/open-mpi/ompi-release/pull/56
Would you mind testing any nightly tarball after tonight? (i.e., the v1.8
tarballs generated tonight will be the first ones to contain this fix)
http://www.open-mpi.org/nightly/master/
On Oct 24, 2014, at 11:46 AM, <[email protected]> <[email protected]>
wrote:
> Dear developers of OPENMPI,
>
> I am running a small downsized Fortran-testprogram for shared memory
> allocation (using MPI_WIN_ALLOCATE_SHARED and MPI_WIN_SHARED_QUERY) )
> on only 1 node of 2 different Linux-clusters with OPENMPI-1.8.3 and
> Intel-14.0.4 /Intel-13.0.1, respectively.
>
> The program simply allocates a sequence of shared data windows, each
> consisting of 1 integer*4-array.
> None of the windows is freed, so the amount of allocated data in shared
> windows raises during the course of the execution.
>
> That worked well on the 1st cluster (Laki, having 8 procs per node))
> when allocating even 1000 shared windows each having 50000 integer*4 array
> elements, i.e. a total of 200 MBytes.
> On the 2nd cluster (Cluster5, having 24 procs per node) it also worked on the
> login node, but it did NOT work on a compute node.
> In that error case, there occurs something like an internal storage limit of
> ~ 140 MB for the total storage allocated in all shared windows.
> When that limit is reached, all later shared memory allocations fail (but
> silently).
> So the first attempt to use such a bad shared data window results in a bus
> error due to the bad storage address encountered.
>
> That strange behavior could be observed in the small testprogram but also
> with my large Fortran CFD-code.
> If the error occurs, then it occurs with both codes, and both at a storage
> limit of ~140 MB.
> I found that this storage limit depends only weakly on the number of
> processes (for np=2,4,8,16,24 it is: 144.4 , 144.0, 141.0, 137.0,
> 132.2 MB)
>
> Note that the shared memory storage available on both clusters was very large
> (many GB of free memory).
>
> Here is the error message when running with np=2 and an array
> dimension of idim_1=50000 for the integer*4 array allocated per shared
> window on the compute node of Cluster5:
> In that case, the error occurred at the 723-th shared window, which is the
> 1st badly allocated window in that case:
> (722 successfully allocated shared windows * 50000 array elements * 4
> Bytes/el. = 144.4 MB)
>
>
> [1,0]<stdout>: ========on nodemaster: iwin= 722 :
> [1,0]<stdout>: total storage [MByte] alloc. in shared windows so far:
> 144.400000000000
> [1,0]<stdout>: =========== allocation of shared window no. iwin= 723
> [1,0]<stdout>: starting now with idim_1= 50000
> [1,0]<stdout>: ========on nodemaster for iwin= 723 : before writing
> on shared mem
> [1,0]<stderr>:[r5i5n13:12597] *** Process received signal ***
> [1,0]<stderr>:[r5i5n13:12597] Signal: Bus error (7)
> [1,0]<stderr>:[r5i5n13:12597] Signal code: Non-existant physical
> address (2) [1,0]<stderr>:[r5i5n13:12597] Failing at address:
> 0x7fffe08da000 [1,0]<stderr>:[r5i5n13:12597] [ 0]
> [1,0]<stderr>:/lib64/libpthread.so.0(+0xf800)[0x7ffff6d67800]
> [1,0]<stderr>:[r5i5n13:12597] [ 1] ./a.out[0x408a8b]
> [1,0]<stderr>:[r5i5n13:12597] [ 2] ./a.out[0x40800c]
> [1,0]<stderr>:[r5i5n13:12597] [ 3]
> [1,0]<stderr>:/lib64/libc.so.6(__libc_start_main+0xe6)[0x7ffff69fec36]
> [1,0]<stderr>:[r5i5n13:12597] [ 4] [1,0]<stderr>:./a.out[0x407f09]
> [1,0]<stderr>:[r5i5n13:12597] *** End of error message ***
> [1,1]<stderr>:forrtl: error (78): process killed (SIGTERM)
> [1,1]<stderr>:Image PC Routine Line
> Source
> [1,1]<stderr>:libopen-pal.so.6 00007FFFF4B74580 Unknown
> Unknown Unknown
> [1,1]<stderr>:libmpi.so.1 00007FFFF7267F3E Unknown
> Unknown Unknown
> [1,1]<stderr>:libmpi.so.1 00007FFFF733B555 Unknown
> Unknown Unknown
> [1,1]<stderr>:libmpi.so.1 00007FFFF727DFFD Unknown
> Unknown Unknown
> [1,1]<stderr>:libmpi_mpifh.so.2 00007FFFF779BA03 Unknown
> Unknown Unknown
> [1,1]<stderr>:a.out 0000000000408D15 Unknown
> Unknown Unknown
> [1,1]<stderr>:a.out 000000000040800C Unknown
> Unknown Unknown
> [1,1]<stderr>:libc.so.6 00007FFFF69FEC36 Unknown
> Unknown Unknown
> [1,1]<stderr>:a.out 0000000000407F09 Unknown
> Unknown Unknown
> ----------------------------------------------------------------------
> ---- mpiexec noticed that process rank 0 with PID 12597 on node
> r5i5n13 exited on signal 7 (Bus error).
> ----------------------------------------------------------------------
> ----
>
>
> The small Ftn-testprogram was built by
> mpif90 sharedmemtest.f90
> mpiexec -np 2 -bind-to core -tag-output ./a.out
>
> Why does it work on the Laki (both on login-node and on a compute
> node) as well as on the login-node of Cluster5, but fails on an compute node
> of Cluster5?
>
> Greetings
> Michael Rachner
>
>
>
> _______________________________________________
> users mailing list
> [email protected]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/10/25572.php
--
Jeff Squyres
[email protected]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
_______________________________________________
users mailing list
[email protected]
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/10/25580.php