Gilles,

Thanks for the quick reply and the immediate fix. I can confirm that allocations from both MPI_Win_allocate_shared and MPI_Win_allocate are now consistently aligned at 8-byte boundaries and the application runs fine now.

For the records, allocations from malloc and MPI_Mem_alloc are consistently aligned on 16 bytes on my machine. I have not investigated whether the difference in alignment has any impact on performance. Unfortunately, MPI in general does not seem to offer means for controlling the alignment as posix_memalign does so we would have to ensure larger alignments ourselves if this was the case.

Best regards
Joseph

On 02/15/2017 05:45 AM, Gilles Gouaillardet wrote:

Joseph,


thanks for the report and the test program.


the memory allocated by MPI_Win_allocate_shared() is indeed aligned on (4*communicator_size).

i could not reproduce such a thing with MPI_Win_allocate(), but will investigate it.


i fixed MPI_Win_allocate_shared() in https://github.com/open-mpi/ompi/pull/2978,

meanwhile, you can manually download and apply the patch at https://github.com/open-mpi/ompi/pull/2978.patch


Cheers,


Gilles


On 2/14/2017 11:01 PM, Joseph Schuchart wrote:
Hi,

We have been experiencing strange crashes in our application that mostly works on memory allocated through MPI_Win_allocate and MPI_Win_allocate_shared. We eventually realized that the application crashes if it is compiled with -O3 or -Ofast and run with an odd number of processors on our x86_64 machines.

After some debugging we found that the minimum alignment of the memory returned by MPI_Win_allocate is 4 Bytes, which is fine for 32b data types but causes problems with 64b data types (such as size_t) and automatic loop vectorization (tested with GCC 5.3.0). Here the compiler assumes a natural alignment, which should be at least 8 Byte on x86_64 and is guaranteed by malloc and new.

Interestingly, the alignment of the returned memory depends on the number of processes running. I am attaching a small reproducer that prints the alignments of memory returned by MPI_Win_alloc, MPI_Win_alloc_shared, and MPI_Alloc_mem (the latter seems to be fine).

Example for 2 processes (correct alignment):

[MPI_Alloc_mem] Alignment of baseptr=0x260ac60: 32
[MPI_Win_allocate] Alignment of baseptr=0x7f94d7aa30a8: 40
[MPI_Win_allocate_shared] Alignment of baseptr=0x7f94d7aa30a8: 40

Example for 3 processes (alignment 4 Bytes even with 8 Byte displacement unit):

[MPI_Alloc_mem] Alignment of baseptr=0x115e970: 48
[MPI_Win_allocate] Alignment of baseptr=0x7f685f50f0c4: 4
[MPI_Win_allocate_shared] Alignment of baseptr=0x7fec618bc0c4: 4

Is this a known issue? I expect users to rely on basic alignment guarantees made by malloc/new to be true for any function providing malloc-like behavior, even more so as a hint on the alignment requirements is passed to MPI_Win_alloc in the form of the disp_unit argument.

I was able to reproduce this issue in both OpenMPI 1.10.5 and 2.0.2. I also tested with MPICH, which provides correct alignment.

Cheers,
Joseph



_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users



_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

--
Dipl.-Inf. Joseph Schuchart
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19
D-70569 Stuttgart

Tel.: +49(0)711-68565890
Fax: +49(0)711-6856832
E-Mail: schuch...@hlrs.de

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to