Jun,
a patch is available at
https://github.com/ggouaillardet/ompi-release/commit/f277beace9fbe8dd71f733602b5d4b0344d77a29.patch
this is not a bulletproof one, but it does fix your problem.
in this case, MPI_Ineighbor_alltoallw is invoked with sendbuf ==
recvbuf, and internally,
libnbc considers this is an in place alltoall, and hence allocate a
temporary buffer
(that is now (almost) correctly used with this patch)
this is suboptimal, since even if sendbuf == recvbuf, the displacements
you use ensure there
is no overlap.
bottom line, this patch does fix your problem, but because of the libnbc
internals, the second MPI_Ineighbor_alltoallw is suboptimal (assuming
such a call is allowed by the standard)
master does things differently, and there is no such bug here.
George,
is it valid (per the MPI standard) to invoke MPI_Ineighbor_alltoallw
with sendbuf == recvbuf ?
bonus question :
what if we have sendbuf != recvbuf, but the data overlap because of the
displacements ?
for example :
int buf[1];
MPI_Ineighbor_alltoallw(buf, 1, {0}, MPI_INT, buf+1, 1, {-4}, MPI_INT,
MPI_COMM_WORLD, &request)
is this allowed per the MPI standard ? if yes, then the implementation
should figure this out, and i am pretty sure it does not currently.
Cheers,
Gilles
On 3/8/2016 9:18 AM, Jun Kudo wrote:
Giles,
Thanks for the small bug fix. It helped clear up that test case but
I'm again running into another segmentation fault on a more
complicated problem.
I've attached another 'working' example. This time I am using the
MPI_Ineighbor_alltoallw on a triangular topology; node 0 communicates
bi-directionally with nodes 1 and 2, node 1 with nodes 0 and 2, and
node 2 with node 0 and 1. Each node is sending one double (with value
my_rank) to each of its neighbors.
The code has two different calls to the MPI API that only differ in
the receive buffer arguments. In both versions, I am sending from and
receiving into the same static array. In the working
(non-segfaulting) version, I am receiving into the latter half of the
array by pointing to the start of the second half (&send_number[2])
and specifying displacements of 0 and 8 bytes. In the segfaulting
version, I am again receiving into the latter half of the array by
pointing to the start of the array (send_number) with displacements of
16 to 24 bytes.
The program run with the command 'mpirun -n 3
./simpleneighborhood_multiple' compiled with the latest OpenMPI
(1.10.2) + patch encounters a segmentation fault when receiving using
the latter commands. The same program compiled with MPICH (3.2) runs
without any problems and with the expected results.
Let me know if I'm screwing anything up. Thanks for the help.
Sincerely,
Jun
On Mon, Feb 29, 2016 at 9:34 PM, Gilles Gouaillardet
<gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:
Thanks for the report and the test case,
this is a bug and i pushed a commit to master.
for the time being, you can download a patch for v1.10 at
https://github.com/ggouaillardet/ompi-release/commit/4afdab0aa86e5127767c4dfbdb763b4cb641e37a.patch
Cheers,
Gilles
On 3/1/2016 12:17 AM, Jun Kudo wrote:
Hello,
I'm trying to use the neighborhood collective communication
capabilities (MPI_Ineighbor_x) of MPI coupled with the
distributed graph constructor (MPI_Dist_graph_create_adjacent)
but I'm encountering a segmentation fault on a test case.
I have attached a 'working' example where I create a MPI
communicator with a simple distributed graph topology where Rank
0 contains Node 0 that communicates bi-directionally (receiving
from and sending to) with Node 1 located on Rank 1. I then
attempt to send integer messages using the neighborhood
collective MPI_Ineighbor_alltoall. The program run with the
command 'mpirun -n 2 ./simpleneighborhood' compiled with the
latest OpenMPI (1.10.2) encounters a segmentation fault during
the non-blocking call. The same program compiled with MPICH
(3.2) runs without any problems and with the expected results.
To muddy the waters a little more, the same program compiled with
OpenMPI but using the blocking neighborhood collective,
MPI_Neighbor_alltoall, seems to run just fine as well.
I'm not really sure at this point if I'm making a simple mistake
in the construction of my test or if something is more
fundamentally wrong. I would appreciate any insight into my
problem!
Thanks ahead of the time for help and let me know if I can
provide anymore information.
Sincerely,
Jun
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this
post:http://www.open-mpi.org/community/lists/users/2016/02/28608.php
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/02/28610.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/03/28655.php