George & Mattheiu, > The Alltoall should only return when all data is sent and received on > the current rank, so there shouldn't be any race condition.
Your right this is MPI not pthreads. That should never happen. Duh! > I think the issue is with the way you define the send and receive > buffer in the MPI_Alltoall. You have to keep in mind that the > all-to-all pattern will overwrite the entire data in the receive > buffer. Thus, starting from a relative displacement in the data (in > this case matrix[wrank*wrows]), begs for troubles, as you will write > outside the receive buffer. The submatrix corresponding to matrix[wrank*wrows][0] to matrix[(wrank+1)*wrows-1][:] is valid only on the wrank process. This is a block distribution of the rows like what MPI_Scatter would produce. As wrows is equal to N (matrix width/height) divided by wsize, the number of mpi_all_t blocks in each message is equal to wsize. Therefore, there should be no writing outside the bounds of the submatrix. On another note, I just ported the example to use dynamic memory and now I'm getting segfaults when I call MPI_Finalize(). Any idea what in the code could have caused this? It's paste binned here: https://gist.github.com/anonymous/a80e0679c3cbffb82e39 The result is [sgillila@jarvis src]$ mpirun -npernode 2 transpose2 8 N = 8 Matrix = 0: 0 1 2 3 4 5 6 7 0: 8 9 10 11 12 13 14 15 0: 16 17 18 19 20 21 22 23 0: 24 25 26 27 28 29 30 31 1: 32 33 34 35 36 37 38 39 1: 40 41 42 43 44 45 46 47 1: 48 49 50 51 52 53 54 55 1: 56 57 58 59 60 61 62 63 Matrix = 0: 0 8 16 24 32 40 48 56 0: 1 9 17 25 33 41 49 57 0: 2 10 18 26 34 42 50 58 0: 3 11 19 27 35 43 51 59 1: 4 12 20 28 36 44 52 60 1: 5 13 21 29 37 45 53 61 1: 6 14 22 30 38 46 54 62 1: 7 15 23 31 39 47 55 63 [jarvis:09314] *** Process received signal *** [jarvis:09314] Signal: Segmentation fault (11) [jarvis:09314] Signal code: Address not mapped (1) [jarvis:09314] Failing at address: 0x21da228 [jarvis:09314] [ 0] /lib64/libpthread.so.0() [0x371480f500] [jarvis:09314] [ 1] /opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_int_free+0x75) [0x7f2e85452575] [jarvis:09314] [ 2] /opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_free+0xd3) [0x7f2e85452bc3] [jarvis:09314] [ 3] transpose2(main+0x160) [0x4012a0] [jarvis:09314] [ 4] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3713c1ecdd] [jarvis:09314] [ 5] transpose2() [0x400d49] [jarvis:09314] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 1 with PID 9314 on node jarvis.cs.iit.edu exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- -- Spenser Gilliland Computer Engineer Doctoral Candidate