Hello, If I set the size2 values according to your suggestion, which is the same values as on sending nodes, it works fine. But by definition it does not need to be exactly the same as the length of sent data, and it is just a maximum length of expected data to receive. If not, it is inevitable to run a allToAll() first to communicate the data sizes, and then doing the main allToAllV(), which is an expensive unnecessary communication overhead.
I just created a reproducer in C++ which gives the error under OpenMPI 1.8.4, but runs correctly under OpenMPI 1.5.4 . (I've not included the Java version of this reproducer, which I think is not important as current version is enough to reproduce the error. But in case, it is straight forward to convert this code to Java). Thanks, -- HR On Mon, Apr 6, 2015 at 3:03 PM, Ralph Castain <r...@open-mpi.org> wrote: > That would imply that the issue is in the underlying C implementation in > OMPI, not the Java bindings. The reproducer would definitely help pin it > down. > > If you change the size2 values to the ones we sent you, does the program > by chance work? > > > On Apr 6, 2015, at 1:44 PM, Hamidreza Anvari <hr.anv...@gmail.com> wrote: > > I'll try that as well. > Meanwhile, I found that my c++ code is running fine on a machine running > OpenMPI 1.5.4, but I receive the same error under OpenMPI 1.8.4 for both > Java and C++. > > On Mon, Apr 6, 2015 at 2:21 PM, Howard Pritchard <hpprit...@gmail.com> > wrote: > >> Hello HR, >> >> Thanks! If you have Java 1.7 installed on your system would you mind >> trying to test against that version too? >> >> Thanks, >> >> Howard >> >> >> 2015-04-06 13:09 GMT-06:00 Hamidreza Anvari <hr.anv...@gmail.com>: >> >>> Hello, >>> >>> 1. I'm using Java/Javac version 1.8.0_20 under OS X 10.10.2. >>> >>> 2. I have used the following configuration for making OpenMPI: >>> ./configure --enable-mpi-java >>> --with-jdk-bindir="/System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands" >>> --with-jdk-headers="/System/Library/Frameworks/JavaVM.framework/Versions/Current/Headers" >>> --prefix="/users/hamidreza/openmpi-1.8.4" >>> >>> make all install >>> >>> 3. As a logical point of view, size2 is the maximum expected data to >>> receive, which in turn might be less that this maximum. >>> >>> 4. I will try to prepare a working reproducer of my error and send it to >>> you. >>> >>> Thanks, >>> -- HR >>> >>> On Mon, Apr 6, 2015 at 10:46 AM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>>> I've talked to the folks who wrote the Java bindings. One possibility >>>> we identified is that there may be an error in your code when you did the >>>> translation >>>> >>>> My immediate thought is that each process can not receive more elements >>>> than it was sent to them. That's the reason of truncation error. >>>> >>>> These are the correct values: >>>> >>>> rank 0 - size2: 2,2,1,1 >>>> rank 1 - size2: 1,1,1,1 >>>> rank 2 - size2: 0,1,1,2 >>>> rank 3 - size2: 2,1,2,1 >>>> >>>> >>>> Can you check your code to see if perhaps the values you are passing >>>> didn't get translated correctly from your C++ version to the Java version? >>>> >>>> >>>> >>>> On Apr 6, 2015, at 5:03 AM, Howard Pritchard <hpprit...@gmail.com> >>>> wrote: >>>> >>>> Hello HR, >>>> >>>> It would also be useful to know which java version you are using, as >>>> well >>>> as the configure options used when building open mpi. >>>> >>>> Thanks, >>>> >>>> Howard >>>> >>>> >>>> >>>> 2015-04-05 19:10 GMT-06:00 Ralph Castain <r...@open-mpi.org>: >>>> >>>>> If not too much trouble, can you extract just the alltoallv portion >>>>> and provide us with a small reproducer? >>>>> >>>>> >>>>> On Apr 5, 2015, at 12:11 PM, Hamidreza Anvari <hr.anv...@gmail.com> >>>>> wrote: >>>>> >>>>> Hello, >>>>> >>>>> I am converting an existing MPI program in C++ to Java using OpenMPI >>>>> 1.8.4, >>>>> At some point I have a allToAllv() code which works fine in C++ but >>>>> receives error in Java version: >>>>> >>>>> MPI.COMM_WORLD.allToAllv(data, subpartition_size, subpartition_offset, >>>>> MPI.INT <http://mpi.int/>, >>>>> data2,subpartition_size2,subpartition_offset2,MPI.INT >>>>> <http://mpi.int/>); >>>>> >>>>> Error: >>>>> *** An error occurred in MPI_Alltoallv >>>>> *** reported by process [3621322753,9223372036854775811] >>>>> *** on communicator MPI_COMM_WORLD >>>>> *** MPI_ERR_TRUNCATE: message truncated >>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now >>>>> abort, >>>>> *** and potentially your MPI job) >>>>> 3 more processes have sent help message help-mpi-errors.txt / >>>>> mpi_errors_are_fatal >>>>> Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / >>>>> error messages >>>>> >>>>> Here are the values for parameters: >>>>> >>>>> data.length = 5 >>>>> data2.length = 20 >>>>> >>>>> ---------- Rank 0 of 4 ---------- >>>>> subpartition_offset:0,2,3,3, >>>>> subpartition_size:2,1,0,2, >>>>> subpartition_offset2:0,5,10,15, >>>>> subpartition_size2:5,5,5,5, >>>>> ---------- >>>>> ---------- Rank 1 of 4 ---------- >>>>> subpartition_offset:0,2,3,4, >>>>> subpartition_size:2,1,1,1, >>>>> subpartition_offset2:0,5,10,15, >>>>> subpartition_size2:5,5,5,5, >>>>> ---------- >>>>> ---------- Rank 2 of 4 ---------- >>>>> subpartition_offset:0,1,2,3, >>>>> subpartition_size:1,1,1,2, >>>>> subpartition_offset2:0,5,10,15, >>>>> subpartition_size2:5,5,5,5, >>>>> ---------- >>>>> ---------- Rank 3 of 4 ---------- >>>>> subpartition_offset:0,1,2,4, >>>>> subpartition_size:1,1,2,1, >>>>> subpartition_offset2:0,5,10,15, >>>>> subpartition_size2:5,5,5,5, >>>>> ---------- >>>>> >>>>> Again, this is a code which works in C++ version. >>>>> >>>>> Any help or advice is greatly appreciated. >>>>> >>>>> Thanks, >>>>> -- HR >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2015/04/26610.php >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2015/04/26613.php >>>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/04/26615.php >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/04/26616.php >>>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/04/26617.php >>> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/04/26620.php >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/04/26622.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/04/26623.php >
// this code is organized for running with "-np 4" in mpirun #include <mpi.h> #include <iostream> #define ROOT_NODE 0 using namespace std; int main(int argc, char *argv[]) { int data[20] = {0}; int data2[50] = {0}; int data_db[4][20] = { {9394287,15910968,38127303,84350675,91341460,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}, {3938568,23514575,30049874,66951201,75970968,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}, {26064364,28684381,43722826,73412832,88193851,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}, {27148880,41379192,61636168,67630198,95203523,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0} }; int subpartition_offset_db[4][4] = { {0,2,3,3}, {0,2,3,4}, {0,1,2,3}, {0,1,2,4} }; int subpartition_size_db[4][4] = { {2,1,0,2}, {2,1,1,1}, {1,1,1,2}, {1,1,2,1} }; int subpartition_offset[4] = {}; int subpartition_size[4] = {}; int subpartition_offset2[4] = {0,5,10,15}; int subpartition_size2[4] = {5,5,5,5}; MPI::Init(argc, argv); int cluster_size = MPI::COMM_WORLD.Get_size(); int my_rank = MPI::COMM_WORLD.Get_rank(); // initializing local variables for(int i=0;i<4;i++) { subpartition_size[i] = subpartition_size_db[my_rank][i]; subpartition_offset[i] = subpartition_offset_db[my_rank][i]; for(int j=0;j<5;j++) { data[j] = data_db[my_rank][j]; } } MPI::COMM_WORLD.Alltoallv(data, subpartition_size, subpartition_offset, MPI::INT, data2,subpartition_size2,subpartition_offset2,MPI::INT); //cout << my_rank*100 << endl; //print out results for(int i=0;i<4;i++){ if(my_rank == i){ cout << endl << "---------- Rank " << my_rank << " of " << cluster_size << " ----------" << endl; cout << "Data Received:" << endl; for(int j=0; j<20 ; j++) cout<< data2[j] << "," ; cout << endl << "----------" << endl; } MPI::COMM_WORLD.Barrier(); } MPI::Finalize(); return 0; }