In the interim, perhaps another way of addressing this would be to ask: what happens when you run your reproducer with MPICH? Does that work?
This would at least tell us how another implementation interpreted that function. > On Apr 7, 2015, at 10:18 AM, Ralph Castain <r...@open-mpi.org> wrote: > > I’m afraid we’ll have to get someone from the Forum to interpret (Howard is a > member as well), but here is what I see just below that, in the description > section: > > The type signature associated with sendcounts[j], sendtype at process i must > be equal to the type signature associated with recvcounts[i], recvtype at > process j. This implies that the amount of data sent must be equal to the > amount of data received, pairwise between every pair of processes > > >> On Apr 7, 2015, at 9:56 AM, Hamidreza Anvari <hr.anv...@gmail.com >> <mailto:hr.anv...@gmail.com>> wrote: >> >> Hello, >> >> Thanks for your description. >> I'm currently doing allToAll() prior to allToAllV(), to communicate length >> of expected messages. >> . >> BUT, I still strongly believe that the right implementation of this method >> is something that I expected earlier! >> If you check the MPI specification here: >> >> http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf >> <http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf> >> Page 170 >> Line 14 >> >> It is mentioned that "... the number of elements that CAN be received...". >> which implies that the actual received message may have shorter length. >> >> While in cases where it is mandatory to have same value, the modal "MUST" is >> used. for example at page 171 Line 1, it is mentioned that "... sendtype at >> process i MUST be equal to the type signature ...". >> >> SO, I would expect that any consistent implementation of MPI specification >> handle this message length matching by itself, as I asked originally. >> >> Thanks, >> -- HR >> >> On Tue, Apr 7, 2015 at 6:03 AM, Howard Pritchard <hpprit...@gmail.com >> <mailto:hpprit...@gmail.com>> wrote: >> Hi HR, >> >> Sorry for not noticing the receive side earlier, but as Ralph implied earlier >> in this thread, the MPI standard has more strict type matching for >> collectives >> than for point to point. Namely, the number of bytes the receiver expects >> to receive from a given sender in the alltoallv must match the number of >> bytes >> sent by the sender. >> >> You were just getting lucky with the older open mpi. The error message >> isn't so great though. Its likely in the newer open mpi you are using a >> collective algorithm for alltoallv that assumes you're app is obeying the >> standard. >> >> You are correct that if the ranks don't know how much data will be sent >> to them from each rank prior to the alltoallv op, you will need to have some >> mechanism for exchanging this info prior to the alltoallv op. >> >> Howard >> >> >> 2015-04-06 23:23 GMT-06:00 Hamidreza Anvari <hr.anv...@gmail.com >> <mailto:hr.anv...@gmail.com>>: >> Hello, >> >> If I set the size2 values according to your suggestion, which is the same >> values as on sending nodes, it works fine. >> But by definition it does not need to be exactly the same as the length of >> sent data, and it is just a maximum length of expected data to receive. If >> not, it is inevitable to run a allToAll() first to communicate the data >> sizes, and then doing the main allToAllV(), which is an expensive >> unnecessary communication overhead. >> >> I just created a reproducer in C++ which gives the error under OpenMPI >> 1.8.4, but runs correctly under OpenMPI 1.5.4 . >> (I've not included the Java version of this reproducer, which I think is not >> important as current version is enough to reproduce the error. But in case, >> it is straight forward to convert this code to Java). >> >> Thanks, >> -- HR >> >> On Mon, Apr 6, 2015 at 3:03 PM, Ralph Castain <r...@open-mpi.org >> <mailto:r...@open-mpi.org>> wrote: >> That would imply that the issue is in the underlying C implementation in >> OMPI, not the Java bindings. The reproducer would definitely help pin it >> down. >> >> If you change the size2 values to the ones we sent you, does the program by >> chance work? >> >> >>> On Apr 6, 2015, at 1:44 PM, Hamidreza Anvari <hr.anv...@gmail.com >>> <mailto:hr.anv...@gmail.com>> wrote: >>> >>> I'll try that as well. >>> Meanwhile, I found that my c++ code is running fine on a machine running >>> OpenMPI 1.5.4, but I receive the same error under OpenMPI 1.8.4 for both >>> Java and C++. >>> >>> On Mon, Apr 6, 2015 at 2:21 PM, Howard Pritchard <hpprit...@gmail.com >>> <mailto:hpprit...@gmail.com>> wrote: >>> Hello HR, >>> >>> Thanks! If you have Java 1.7 installed on your system would you mind >>> trying to test against that version too? >>> >>> Thanks, >>> >>> Howard >>> >>> >>> 2015-04-06 13:09 GMT-06:00 Hamidreza Anvari <hr.anv...@gmail.com >>> <mailto:hr.anv...@gmail.com>>: >>> Hello, >>> >>> 1. I'm using Java/Javac version 1.8.0_20 under OS X 10.10.2. >>> >>> 2. I have used the following configuration for making OpenMPI: >>> ./configure --enable-mpi-java >>> --with-jdk-bindir="/System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands" >>> >>> --with-jdk-headers="/System/Library/Frameworks/JavaVM.framework/Versions/Current/Headers" >>> --prefix="/users/hamidreza/openmpi-1.8.4" >>> >>> make all install >>> >>> 3. As a logical point of view, size2 is the maximum expected data to >>> receive, which in turn might be less that this maximum. >>> >>> 4. I will try to prepare a working reproducer of my error and send it to >>> you. >>> >>> Thanks, >>> -- HR >>> >>> On Mon, Apr 6, 2015 at 10:46 AM, Ralph Castain <r...@open-mpi.org >>> <mailto:r...@open-mpi.org>> wrote: >>> I’ve talked to the folks who wrote the Java bindings. One possibility we >>> identified is that there may be an error in your code when you did the >>> translation >>> >>>> My immediate thought is that each process can not receive more elements >>>> than it was sent to them. That's the reason of truncation error. >>>> >>>> These are the correct values: >>>> >>>> rank 0 - size2: 2,2,1,1 >>>> rank 1 - size2: 1,1,1,1 >>>> rank 2 - size2: 0,1,1,2 >>>> rank 3 - size2: 2,1,2,1 >>> >>> Can you check your code to see if perhaps the values you are passing didn’t >>> get translated correctly from your C++ version to the Java version? >>> >>> >>> >>>> On Apr 6, 2015, at 5:03 AM, Howard Pritchard <hpprit...@gmail.com >>>> <mailto:hpprit...@gmail.com>> wrote: >>>> >>>> Hello HR, >>>> >>>> It would also be useful to know which java version you are using, as well >>>> as the configure options used when building open mpi. >>>> >>>> Thanks, >>>> >>>> Howard >>>> >>>> >>>> >>>> 2015-04-05 19:10 GMT-06:00 Ralph Castain <r...@open-mpi.org >>>> <mailto:r...@open-mpi.org>>: >>>> If not too much trouble, can you extract just the alltoallv portion and >>>> provide us with a small reproducer? >>>> >>>> >>>>> On Apr 5, 2015, at 12:11 PM, Hamidreza Anvari <hr.anv...@gmail.com >>>>> <mailto:hr.anv...@gmail.com>> wrote: >>>>> >>>>> Hello, >>>>> >>>>> I am converting an existing MPI program in C++ to Java using OpenMPI >>>>> 1.8.4, >>>>> At some point I have a allToAllv() code which works fine in C++ but >>>>> receives error in Java version: >>>>> >>>>> MPI.COMM_WORLD.allToAllv(data, subpartition_size, subpartition_offset, >>>>> MPI.INT <http://mpi.int/>, >>>>> data2,subpartition_size2,subpartition_offset2,MPI.INT <http://mpi.int/>); >>>>> >>>>> Error: >>>>> *** An error occurred in MPI_Alltoallv >>>>> *** reported by process [3621322753,9223372036854775811] >>>>> *** on communicator MPI_COMM_WORLD >>>>> *** MPI_ERR_TRUNCATE: message truncated >>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >>>>> *** and potentially your MPI job) >>>>> 3 more processes have sent help message help-mpi-errors.txt / >>>>> mpi_errors_are_fatal >>>>> Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error >>>>> messages >>>>> >>>>> Here are the values for parameters: >>>>> >>>>> data.length = 5 >>>>> data2.length = 20 >>>>> >>>>> ---------- Rank 0 of 4 ---------- >>>>> subpartition_offset:0,2,3,3, >>>>> subpartition_size:2,1,0,2, >>>>> subpartition_offset2:0,5,10,15, >>>>> subpartition_size2:5,5,5,5, >>>>> ---------- >>>>> ---------- Rank 1 of 4 ---------- >>>>> subpartition_offset:0,2,3,4, >>>>> subpartition_size:2,1,1,1, >>>>> subpartition_offset2:0,5,10,15, >>>>> subpartition_size2:5,5,5,5, >>>>> ---------- >>>>> ---------- Rank 2 of 4 ---------- >>>>> subpartition_offset:0,1,2,3, >>>>> subpartition_size:1,1,1,2, >>>>> subpartition_offset2:0,5,10,15, >>>>> subpartition_size2:5,5,5,5, >>>>> ---------- >>>>> ---------- Rank 3 of 4 ---------- >>>>> subpartition_offset:0,1,2,4, >>>>> subpartition_size:1,1,2,1, >>>>> subpartition_offset2:0,5,10,15, >>>>> subpartition_size2:5,5,5,5, >>>>> ---------- >>>>> >>>>> Again, this is a code which works in C++ version. >>>>> >>>>> Any help or advice is greatly appreciated. >>>>> >>>>> Thanks, >>>>> -- HR >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2015/04/26610.php >>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26610.php> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/04/26613.php >>>> <http://www.open-mpi.org/community/lists/users/2015/04/26613.php> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/04/26615.php >>>> <http://www.open-mpi.org/community/lists/users/2015/04/26615.php> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/04/26616.php >>> <http://www.open-mpi.org/community/lists/users/2015/04/26616.php> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/04/26617.php >>> <http://www.open-mpi.org/community/lists/users/2015/04/26617.php> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/04/26620.php >>> <http://www.open-mpi.org/community/lists/users/2015/04/26620.php> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/04/26622.php >>> <http://www.open-mpi.org/community/lists/users/2015/04/26622.php> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/04/26623.php >> <http://www.open-mpi.org/community/lists/users/2015/04/26623.php> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/04/26631.php >> <http://www.open-mpi.org/community/lists/users/2015/04/26631.php> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/04/26634.php >> <http://www.open-mpi.org/community/lists/users/2015/04/26634.php> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/04/26637.php >