In the interim, perhaps another way of addressing this would be to ask: what 
happens when you run your reproducer with MPICH? Does that work?

This would at least tell us how another implementation interpreted that 
function.


> On Apr 7, 2015, at 10:18 AM, Ralph Castain <r...@open-mpi.org> wrote:
> 
> I’m afraid we’ll have to get someone from the Forum to interpret (Howard is a 
> member as well), but here is what I see just below that, in the description 
> section:
> 
> The type signature associated with sendcounts[j], sendtype at process i must 
> be equal to the type signature associated with recvcounts[i], recvtype at 
> process j. This implies that the amount of data sent must be equal to the 
> amount of data received, pairwise between every pair of processes
> 
> 
>> On Apr 7, 2015, at 9:56 AM, Hamidreza Anvari <hr.anv...@gmail.com 
>> <mailto:hr.anv...@gmail.com>> wrote:
>> 
>> Hello,
>> 
>> Thanks for your description.
>> I'm currently doing allToAll() prior to allToAllV(), to communicate length 
>> of expected messages.
>> .
>> BUT, I still strongly believe that the right implementation of this method 
>> is something that I expected earlier!
>> If you check the MPI specification here:
>> 
>> http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf 
>> <http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf>
>> Page 170
>> Line 14
>> 
>> It is mentioned that "... the number of elements that CAN be received...". 
>> which implies that the actual received message may have shorter length.
>> 
>> While in cases where it is mandatory to have same value, the modal "MUST" is 
>> used. for example at page 171 Line 1, it is mentioned that "... sendtype at 
>> process i MUST be equal to the type signature ...".
>> 
>> SO, I would expect that any consistent implementation of MPI specification 
>> handle this message length matching by itself, as I asked originally.
>> 
>> Thanks,
>> -- HR
>> 
>> On Tue, Apr 7, 2015 at 6:03 AM, Howard Pritchard <hpprit...@gmail.com 
>> <mailto:hpprit...@gmail.com>> wrote:
>> Hi HR,
>> 
>> Sorry for not noticing the receive side earlier, but as Ralph implied earlier
>> in this thread, the MPI standard has more strict type matching for 
>> collectives
>> than for point to point.  Namely, the number of bytes the receiver expects
>> to receive from a given sender in the alltoallv must match the number of 
>> bytes
>> sent by the sender.
>> 
>> You were just getting lucky with the older open mpi.  The error message
>> isn't so great though.  Its likely in the newer open mpi you are using a
>> collective algorithm for alltoallv that assumes you're app is obeying the
>> standard.  
>> 
>> You are correct that if the ranks don't know how much data will be sent
>> to them from each rank prior to the alltoallv op, you will need to have some
>> mechanism for exchanging this info prior to the alltoallv op.
>> 
>> Howard
>> 
>> 
>> 2015-04-06 23:23 GMT-06:00 Hamidreza Anvari <hr.anv...@gmail.com 
>> <mailto:hr.anv...@gmail.com>>:
>> Hello,
>> 
>> If I set the size2 values according to your suggestion, which is the same 
>> values as on sending nodes, it works fine.
>> But by definition it does not need to be exactly the same as the length of 
>> sent data, and it is just a maximum length of expected data to receive. If 
>> not, it is inevitable to run a allToAll() first to communicate the data 
>> sizes, and then doing the main allToAllV(), which is an expensive 
>> unnecessary communication overhead.
>> 
>> I just created a reproducer in C++ which gives the error under OpenMPI 
>> 1.8.4, but runs correctly under OpenMPI 1.5.4 .
>> (I've not included the Java version of this reproducer, which I think is not 
>> important as current version is enough to reproduce the error. But in case, 
>> it is straight forward to convert this code to Java).
>> 
>> Thanks,
>> -- HR
>> 
>> On Mon, Apr 6, 2015 at 3:03 PM, Ralph Castain <r...@open-mpi.org 
>> <mailto:r...@open-mpi.org>> wrote:
>> That would imply that the issue is in the underlying C implementation in 
>> OMPI, not the Java bindings. The reproducer would definitely help pin it 
>> down.
>> 
>> If you change the size2 values to the ones we sent you, does the program by 
>> chance work?
>> 
>> 
>>> On Apr 6, 2015, at 1:44 PM, Hamidreza Anvari <hr.anv...@gmail.com 
>>> <mailto:hr.anv...@gmail.com>> wrote:
>>> 
>>> I'll try that as well.
>>> Meanwhile, I found that my c++ code is running fine on a machine running 
>>> OpenMPI 1.5.4, but I receive the same error under OpenMPI 1.8.4 for both 
>>> Java and C++.
>>> 
>>> On Mon, Apr 6, 2015 at 2:21 PM, Howard Pritchard <hpprit...@gmail.com 
>>> <mailto:hpprit...@gmail.com>> wrote:
>>> Hello HR,
>>> 
>>> Thanks!  If you have Java 1.7 installed on your system would you mind 
>>> trying to test against that version too?
>>> 
>>> Thanks,
>>> 
>>> Howard
>>> 
>>> 
>>> 2015-04-06 13:09 GMT-06:00 Hamidreza Anvari <hr.anv...@gmail.com 
>>> <mailto:hr.anv...@gmail.com>>:
>>> Hello,
>>> 
>>> 1. I'm using Java/Javac version 1.8.0_20 under OS X 10.10.2.
>>> 
>>> 2. I have used the following configuration for making OpenMPI:
>>> ./configure --enable-mpi-java 
>>> --with-jdk-bindir="/System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands"
>>>  
>>> --with-jdk-headers="/System/Library/Frameworks/JavaVM.framework/Versions/Current/Headers"
>>>  --prefix="/users/hamidreza/openmpi-1.8.4"
>>> 
>>> make all install
>>> 
>>> 3. As a logical point of view, size2 is the maximum expected data to 
>>> receive, which in turn might be less that this maximum. 
>>> 
>>> 4. I will try to prepare a working reproducer of my error and send it to 
>>> you.
>>> 
>>> Thanks,
>>> -- HR
>>> 
>>> On Mon, Apr 6, 2015 at 10:46 AM, Ralph Castain <r...@open-mpi.org 
>>> <mailto:r...@open-mpi.org>> wrote:
>>> I’ve talked to the folks who wrote the Java bindings. One possibility we 
>>> identified is that there may be an error in your code when you did the 
>>> translation
>>> 
>>>> My immediate thought is that each process can not receive more elements 
>>>> than it was sent to them. That's the reason of truncation error.
>>>> 
>>>> These are the correct values:
>>>> 
>>>> rank 0 - size2: 2,2,1,1
>>>> rank 1 - size2: 1,1,1,1
>>>> rank 2 - size2: 0,1,1,2
>>>> rank 3 - size2: 2,1,2,1
>>> 
>>> Can you check your code to see if perhaps the values you are passing didn’t 
>>> get translated correctly from your C++ version to the Java version?
>>> 
>>> 
>>> 
>>>> On Apr 6, 2015, at 5:03 AM, Howard Pritchard <hpprit...@gmail.com 
>>>> <mailto:hpprit...@gmail.com>> wrote:
>>>> 
>>>> Hello HR,
>>>> 
>>>> It would also be useful to know which java version you are using, as well
>>>> as the configure options used when building open mpi.
>>>> 
>>>> Thanks,
>>>> 
>>>> Howard
>>>> 
>>>> 
>>>> 
>>>> 2015-04-05 19:10 GMT-06:00 Ralph Castain <r...@open-mpi.org 
>>>> <mailto:r...@open-mpi.org>>:
>>>> If not too much trouble, can you extract just the alltoallv portion and 
>>>> provide us with a small reproducer?
>>>> 
>>>> 
>>>>> On Apr 5, 2015, at 12:11 PM, Hamidreza Anvari <hr.anv...@gmail.com 
>>>>> <mailto:hr.anv...@gmail.com>> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I am converting an existing MPI program in C++ to Java using OpenMPI 
>>>>> 1.8.4,
>>>>> At some point I have a allToAllv() code which works fine in C++ but 
>>>>> receives error in Java version:
>>>>> 
>>>>> MPI.COMM_WORLD.allToAllv(data, subpartition_size, subpartition_offset, 
>>>>> MPI.INT <http://mpi.int/>,
>>>>> data2,subpartition_size2,subpartition_offset2,MPI.INT <http://mpi.int/>);
>>>>> 
>>>>> Error:
>>>>> *** An error occurred in MPI_Alltoallv
>>>>> *** reported by process [3621322753,9223372036854775811]
>>>>> *** on communicator MPI_COMM_WORLD
>>>>> *** MPI_ERR_TRUNCATE: message truncated
>>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>>>> ***    and potentially your MPI job)
>>>>> 3 more processes have sent help message help-mpi-errors.txt / 
>>>>> mpi_errors_are_fatal
>>>>> Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error 
>>>>> messages
>>>>> 
>>>>> Here are the values for parameters:
>>>>> 
>>>>> data.length = 5
>>>>> data2.length = 20
>>>>> 
>>>>> ---------- Rank 0 of 4 ----------
>>>>> subpartition_offset:0,2,3,3,
>>>>> subpartition_size:2,1,0,2,
>>>>> subpartition_offset2:0,5,10,15,
>>>>> subpartition_size2:5,5,5,5,
>>>>> ----------
>>>>> ---------- Rank 1 of 4 ----------
>>>>> subpartition_offset:0,2,3,4,
>>>>> subpartition_size:2,1,1,1,
>>>>> subpartition_offset2:0,5,10,15,
>>>>> subpartition_size2:5,5,5,5,
>>>>> ----------
>>>>> ---------- Rank 2 of 4 ----------
>>>>> subpartition_offset:0,1,2,3,
>>>>> subpartition_size:1,1,1,2,
>>>>> subpartition_offset2:0,5,10,15,
>>>>> subpartition_size2:5,5,5,5,
>>>>> ----------
>>>>> ---------- Rank 3 of 4 ----------
>>>>> subpartition_offset:0,1,2,4,
>>>>> subpartition_size:1,1,2,1,
>>>>> subpartition_offset2:0,5,10,15,
>>>>> subpartition_size2:5,5,5,5,
>>>>> ----------
>>>>> 
>>>>> Again, this is a code which works in C++ version.
>>>>> 
>>>>> Any help or advice is greatly appreciated.
>>>>> 
>>>>> Thanks,
>>>>> -- HR
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26610.php 
>>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26610.php>
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2015/04/26613.php 
>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26613.php>
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2015/04/26615.php 
>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26615.php>
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2015/04/26616.php 
>>> <http://www.open-mpi.org/community/lists/users/2015/04/26616.php>
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2015/04/26617.php 
>>> <http://www.open-mpi.org/community/lists/users/2015/04/26617.php>
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2015/04/26620.php 
>>> <http://www.open-mpi.org/community/lists/users/2015/04/26620.php>
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2015/04/26622.php 
>>> <http://www.open-mpi.org/community/lists/users/2015/04/26622.php>
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/04/26623.php 
>> <http://www.open-mpi.org/community/lists/users/2015/04/26623.php>
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/04/26631.php 
>> <http://www.open-mpi.org/community/lists/users/2015/04/26631.php>
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/04/26634.php 
>> <http://www.open-mpi.org/community/lists/users/2015/04/26634.php>
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/04/26637.php
> 

Reply via email to