George & Mattheiu,

> The Alltoall should only return when all data is sent and received on
> the current rank, so there shouldn't be any race condition.

Your right this is MPI not pthreads.  That should never happen. Duh!

> I think the issue is with the way you define the send and receive
> buffer in the MPI_Alltoall. You have to keep in mind that the
> all-to-all pattern will overwrite the entire data in the receive
> buffer. Thus, starting from a relative displacement in the data (in
> this case matrix[wrank*wrows]), begs for troubles, as you will write
> outside the receive buffer.

The submatrix corresponding to matrix[wrank*wrows][0] to
matrix[(wrank+1)*wrows-1][:] is valid only on the wrank process.  This
is a block distribution of the rows like what MPI_Scatter would
produce.  As wrows is equal to N (matrix width/height) divided by
wsize, the number of mpi_all_t blocks in each message is equal to
wsize.  Therefore, there should be no writing outside the bounds of
the submatrix.

On another note,
I just ported the example to use dynamic memory and now I'm getting
segfaults when I call MPI_Finalize().  Any idea what in the code could
have caused this?

It's paste binned here: https://gist.github.com/anonymous/a80e0679c3cbffb82e39

The result is

[sgillila@jarvis src]$ mpirun -npernode 2 transpose2 8
N = 8
Matrix =
 0:     0     1     2     3     4     5     6     7
 0:     8     9    10    11    12    13    14    15
 0:    16    17    18    19    20    21    22    23
 0:    24    25    26    27    28    29    30    31
 1:    32    33    34    35    36    37    38    39
 1:    40    41    42    43    44    45    46    47
 1:    48    49    50    51    52    53    54    55
 1:    56    57    58    59    60    61    62    63
Matrix =
 0:     0     8    16    24    32    40    48    56
 0:     1     9    17    25    33    41    49    57
 0:     2    10    18    26    34    42    50    58
 0:     3    11    19    27    35    43    51    59
 1:     4    12    20    28    36    44    52    60
 1:     5    13    21    29    37    45    53    61
 1:     6    14    22    30    38    46    54    62
 1:     7    15    23    31    39    47    55    63
[jarvis:09314] *** Process received signal ***
[jarvis:09314] Signal: Segmentation fault (11)
[jarvis:09314] Signal code: Address not mapped (1)
[jarvis:09314] Failing at address: 0x21da228
[jarvis:09314] [ 0] /lib64/libpthread.so.0() [0x371480f500]
[jarvis:09314] [ 1]
/opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_int_free+0x75)
[0x7f2e85452575]
[jarvis:09314] [ 2]
/opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_free+0xd3)
[0x7f2e85452bc3]
[jarvis:09314] [ 3] transpose2(main+0x160) [0x4012a0]
[jarvis:09314] [ 4] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3713c1ecdd]
[jarvis:09314] [ 5] transpose2() [0x400d49]
[jarvis:09314] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 9314 on node
jarvis.cs.iit.edu exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

-- 
Spenser Gilliland
Computer Engineer
Doctoral Candidate

Reply via email to