Gilles,
Thanks for the quick reply and the immediate fix. I can confirm that
allocations from both MPI_Win_allocate_shared and MPI_Win_allocate are
now consistently aligned at 8-byte boundaries and the application runs
fine now.
For the records, allocations from malloc and MPI_Mem_alloc are
consistently aligned on 16 bytes on my machine. I have not investigated
whether the difference in alignment has any impact on performance.
Unfortunately, MPI in general does not seem to offer means for
controlling the alignment as posix_memalign does so we would have to
ensure larger alignments ourselves if this was the case.
Best regards
Joseph
On 02/15/2017 05:45 AM, Gilles Gouaillardet wrote:
Joseph,
thanks for the report and the test program.
the memory allocated by MPI_Win_allocate_shared() is indeed aligned on
(4*communicator_size).
i could not reproduce such a thing with MPI_Win_allocate(), but will
investigate it.
i fixed MPI_Win_allocate_shared() in
https://github.com/open-mpi/ompi/pull/2978,
meanwhile, you can manually download and apply the patch at
https://github.com/open-mpi/ompi/pull/2978.patch
Cheers,
Gilles
On 2/14/2017 11:01 PM, Joseph Schuchart wrote:
Hi,
We have been experiencing strange crashes in our application that
mostly works on memory allocated through MPI_Win_allocate and
MPI_Win_allocate_shared. We eventually realized that the application
crashes if it is compiled with -O3 or -Ofast and run with an odd
number of processors on our x86_64 machines.
After some debugging we found that the minimum alignment of the
memory returned by MPI_Win_allocate is 4 Bytes, which is fine for 32b
data types but causes problems with 64b data types (such as size_t)
and automatic loop vectorization (tested with GCC 5.3.0). Here the
compiler assumes a natural alignment, which should be at least 8 Byte
on x86_64 and is guaranteed by malloc and new.
Interestingly, the alignment of the returned memory depends on the
number of processes running. I am attaching a small reproducer that
prints the alignments of memory returned by MPI_Win_alloc,
MPI_Win_alloc_shared, and MPI_Alloc_mem (the latter seems to be fine).
Example for 2 processes (correct alignment):
[MPI_Alloc_mem] Alignment of baseptr=0x260ac60: 32
[MPI_Win_allocate] Alignment of baseptr=0x7f94d7aa30a8: 40
[MPI_Win_allocate_shared] Alignment of baseptr=0x7f94d7aa30a8: 40
Example for 3 processes (alignment 4 Bytes even with 8 Byte
displacement unit):
[MPI_Alloc_mem] Alignment of baseptr=0x115e970: 48
[MPI_Win_allocate] Alignment of baseptr=0x7f685f50f0c4: 4
[MPI_Win_allocate_shared] Alignment of baseptr=0x7fec618bc0c4: 4
Is this a known issue? I expect users to rely on basic alignment
guarantees made by malloc/new to be true for any function providing
malloc-like behavior, even more so as a hint on the alignment
requirements is passed to MPI_Win_alloc in the form of the disp_unit
argument.
I was able to reproduce this issue in both OpenMPI 1.10.5 and 2.0.2.
I also tested with MPICH, which provides correct alignment.
Cheers,
Joseph
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart
Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: schuch...@hlrs.de
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users