Re: [OMPI users] Parallel MPI broadcasts (parameterized)
OK, I started implementing the above Allgather() idea without success (segmentation fault). So I will post the problematic lines hare: * comm.Allgather(&(endata.size), 1, MPI::UNSIGNED_LONG_LONG, &(endata_rcv.size), 1, MPI::UNSIGNED_LONG_LONG);* * endata_rcv.data = new unsigned char[endata_rcv.size*lineSize];* * comm.Allgather(&(endata.data), endata.size*lineSize, MPI::UNSIGNED_CHAR, &(endata_rcv.data), endata_rcv.size*lineSize, MPI::UNSIGNED_CHAR);* * delete [] endata.data;* The idea (as it was also for the broadcasts) is first to transmit the data size as an unsigned long long integer, so that the receivers will reserve the required memory for the actual data to be transmitted after that. To my understanding, the problem is that each broadcasted data, let D(s,G), as I explained in the previous email is not only different but also has different size (in general). That's because if I replace the 3rd line with * comm.Allgather(&(endata.data), 1, MPI::UNSIGNED_CHAR, &(endata_rcv.data), 1, MPI::UNSIGNED_CHAR);* seems to work without seg. fault but this is pointless for me since I don't want only 1 char to be transmitted. So if we see the previous image I posted, imagine that the red, green and blue squares are different in size? Can Allgather() even work then? If no, do you suggest anything else or I am trapped in using the MPI_Bcast() as shown in Option 1? On Mon, Nov 6, 2017 at 8:58 AM, George Bosilca wrote: > On Sun, Nov 5, 2017 at 10:23 PM, Konstantinos Konstantinidis < > kostas1...@gmail.com> wrote: > >> Hi George, >> >> First, let me note that the cost of q^(k-1)]*(q-1) communicators was >> fine for the values of parameters q,k I am working with. Also, the whole >> point of speeding up the shuffling phase is trying to reduce this number >> even more (compared to already known implementations) which is a major >> concern of my project. But thanks for pointing that out. Btw, do you know >> what is the maximum such number in MPI? >> > > Last time I run into such troubles these limits were: 2k for MVAPICH, 16k > for MPICH and 2^30-1 for OMPI (all positive signed 23 bits integers). It > might have changed meanwhile. > > >> Now to the main part of the question, let me clarify that I have 1 >> process per machine. I don't know if this is important here but my way of >> thinking is that we have a big text file and each process will have to work >> on some chunks of it (like chapters of a book). But each process resides in >> an machine with some RAM which is able to handle a specific amount of work >> so if you generate many processes per machine you must have fewer book >> chapters per process than before. Thus, I wanted to avoid thinking in the >> process-level rather than machine-level with the RAM limitations. >> >> Now to the actual shuffling, here is what I am currently doing (Option 1): >> >> Let's denote the data that slave s has to send to the slaves in group G >> as D(s,G). >> >> *for each slave s in 1,2,...,K{* >> >> *for each group G that s participates into{* >> >> *if (my rank is s){* >> *MPI_Bcast(send data D(s,G))* >> *}else if(my rank is in group G)* >> *MPI_Bcast(get data D(s,G))* >> *}else{* >> * Do nothing* >> *}* >> >> *}* >> >> *MPI::COMM_WORLD.Barrier();* >> >> *}* >> >> What I suggested before to speedup things (Option 2) is: >> >> *for each set {G(1),G(2),...,G(q-1)} of q-1 disjoint groups{ * >> >> *for each slave s in G(1)* >> *if (my rank is s){* >> *MPI_Bcast(send data D(s,G(1)))* >> *}else if(**my rank is in** group G(1))* >> *MPI_Bcast(get data D(s,G(1)))* >> *}else{* >> * Do nothing* >> *}* >> *}* >> >> *for each slave s in G(2)* >> *if (my rank is s){* >> *MPI_Bcast(send data D(s,G(2)))* >> *}else if(**my rank is in** G(2))* >> *MPI_Bcast(get data D(s,G(2)))* >> *}else{* >> * Do nothing* >> *}* >> *}* >> >> *...* >> >> *for each slave s in G(q-1)* >> *if (my rank is s){* >> *MPI_Bcast(send data D(s,G(q-1)))* >> *}else if(**my rank is in** G(q-1))* >> *MPI_Bcast(get data D(s,G(q-1)))* >> *}else{* >> * Do nothing* >> *}* >> *}* >> >> *MPI::COMM_WORLD.Barrier();* >> >> *}* >> >> My hope was that I could implement Option 2 (in some way without copying >> and pasting the same code q-1 times every time I change q) and that this >> could bring a speedup of q-1 compared to Option 1 by having these groups >> communicate in parallel. Right, now I am trying to find a way to identify >> these sets of groups based on my implementation, which involves some >> abstract algebra but for now let's assume that I can find them in an >> efficient manner. >> >> Let me emphasize that each broadcast sends different actual data. There >> are no two broadcasts that send the same D(s,G). >> >> Fin
Re: [OMPI users] Parallel MPI broadcasts (parameterized)
If each process send a different amount of data, then the operation should be an allgatherv. This also requires that you know the amount each process will send, so you will need a allgather. Schematically the code should look like the following: long bytes_send_count = endata.size * sizeof(long); // compute the amount of data sent by this process long* recv_counts = (long*)malloc(comm_size * sizeof(long)); // allocate buffer to receive the amounts from all peers int displs = (int*)malloc(comm_size * sizeof(int)); // allocate buffer to compute the displacements for each peer MPI_Allgather( &bytes_send_count, 1, MPI_LONG, recv_counts, 1, MPI_LONG, comm); // exchange the amount of sent data long total = 0; // we need a total amount of data to be received for( int i = 0; i < comm_size; i++) { displs[i] = total; // update the displacements total += recv_counts[i]; // and the total count } char* recv_buf = (char*)malloc(total * sizeof(char)); // prepare buffer for the allgatherv MPI_Allgatherv( &(endata.data), endata.size*sizeof(char), MPI_UNSIGNED_CHAR, recv_buf, recv_counts, displs, MPI_UNSIGNED_CHAR, comm); George. On Tue, Nov 7, 2017 at 4:23 AM, Konstantinos Konstantinidis < kostas1...@gmail.com> wrote: > OK, I started implementing the above Allgather() idea without success > (segmentation fault). So I will post the problematic lines hare: > > * comm.Allgather(&(endata.size), 1, MPI::UNSIGNED_LONG_LONG, > &(endata_rcv.size), 1, MPI::UNSIGNED_LONG_LONG);* > * endata_rcv.data = new unsigned char[endata_rcv.size*lineSize];* > * comm.Allgather(&(endata.data), endata.size*lineSize, MPI::UNSIGNED_CHAR, > &(endata_rcv.data), endata_rcv.size*lineSize, MPI::UNSIGNED_CHAR);* > * delete [] endata.data;* > > The idea (as it was also for the broadcasts) is first to transmit the data > size as an unsigned long long integer, so that the receivers will reserve > the required memory for the actual data to be transmitted after that. To my > understanding, the problem is that each broadcasted data, let D(s,G), as I > explained in the previous email is not only different but also has > different size (in general). That's because if I replace the 3rd line with > > * comm.Allgather(&(endata.data), 1, MPI::UNSIGNED_CHAR, > &(endata_rcv.data), 1, MPI::UNSIGNED_CHAR);* > > seems to work without seg. fault but this is pointless for me since I > don't want only 1 char to be transmitted. So if we see the previous image I > posted, imagine that the red, green and blue squares are different in size? > Can Allgather() even work then? If no, do you suggest anything else or I am > trapped in using the MPI_Bcast() as shown in Option 1? > > On Mon, Nov 6, 2017 at 8:58 AM, George Bosilca > wrote: > >> On Sun, Nov 5, 2017 at 10:23 PM, Konstantinos Konstantinidis < >> kostas1...@gmail.com> wrote: >> >>> Hi George, >>> >>> First, let me note that the cost of q^(k-1)]*(q-1) communicators was >>> fine for the values of parameters q,k I am working with. Also, the whole >>> point of speeding up the shuffling phase is trying to reduce this number >>> even more (compared to already known implementations) which is a major >>> concern of my project. But thanks for pointing that out. Btw, do you know >>> what is the maximum such number in MPI? >>> >> >> Last time I run into such troubles these limits were: 2k for MVAPICH, 16k >> for MPICH and 2^30-1 for OMPI (all positive signed 23 bits integers). It >> might have changed meanwhile. >> >> >>> Now to the main part of the question, let me clarify that I have 1 >>> process per machine. I don't know if this is important here but my way of >>> thinking is that we have a big text file and each process will have to work >>> on some chunks of it (like chapters of a book). But each process resides in >>> an machine with some RAM which is able to handle a specific amount of work >>> so if you generate many processes per machine you must have fewer book >>> chapters per process than before. Thus, I wanted to avoid thinking in the >>> process-level rather than machine-level with the RAM limitations. >>> >>> Now to the actual shuffling, here is what I am currently doing (Option >>> 1): >>> >>> Let's denote the data that slave s has to send to the slaves in group G >>> as D(s,G). >>> >>> *for each slave s in 1,2,...,K{* >>> >>> *for each group G that s participates into{* >>> >>> *if (my rank is s){* >>> *MPI_Bcast(send data D(s,G))* >>> *}else if(my rank is in group G)* >>> *MPI_Bcast(get data D(s,G))* >>> *}else{* >>> * Do nothing* >>> *}* >>> >>> *}* >>> >>> *MPI::COMM_WORLD.Barrier();* >>> >>> *}* >>> >>> What I suggested before to speedup things (Option 2) is: >>> >>> *for each set {G(1),G(2),...,G(q-1)} of q-1 disjoint groups{ * >>> >>> *for each slave s in G(1)* >>> *if (my rank is s){* >>> *MPI_Bcast(send data D(s,G(1)))* >>> *}else if(**my rank is in** group G(1))* >>> *
[OMPI users] OpenMPI 1.10.x handling of simultaneous MPI_Abort calls
Hello, In debugging a test of an application, I recently came across odd behavior for simultaneous MPI_Abort calls. Namely, while the MPI_Abort was acknowledged by the process output, the mpirun process failed to exit. I was able to duplicate this behavior on multiple machines with OpenMPI versions 1.10.2, 1.10.5, and 1.10.6 with the following simple program: #include #include #include #include int main(int argc, char **argv) { int rank; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); printf("I am process number %d\n", rank); MPI_Abort(MPI_COMM_WORLD, 3); return 0; } Is this a bug or a feature? Does this behavior exist in OpenMPI versions 2.0 and 3.0? Best, Nik ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] Parallel MPI broadcasts (parameterized)
OK, I will try to explain a few more things about the shuffling and I have attached only specific excerpts of the code to avoid confusion. I have added many comments. First, let me note that this project is an implementation of the Terasort benchmark with a master node which assigns jobs to the slaves and communicates with them after each phase to get measurements. The file shuffle_before.cc shows how I am doing the shuffling up to now and the shuffle_after.cc the progress I made so far switching to Allgatherv(). I have also included the code that measures time and data size since it's crucial for me to check if I have rate speedup. Some questions I have are: 1. At shuffle_after.cc:61 why do we reserve *comm.Get_size() *entries for* recv_counts* and not *comm.Get_size()-1 *? For example if I am rank k what is the point of *recv_counts[k-1]*? I guess that rank k also receives data from himself but we can ignore it, right? 2. My next concern is about the structure of the buffer *recv_buf[]*. The documentation says that the data is stored there ordered. So I assume that it's stored as segments of char* ordered by rank and the way to distinguish them is to chop the whole data based on *recv_counts[]*. So let G = {g1, g2, ..., gN} a group that exchanges data. Let's take slave g2: Then segment *recv_buf[0 until **recv_counts[0]-1**] *is what g2 received from g1, *recv_buf[**recv_counts[0] until **recv_counts[1]-1**] *is what g2 received from himself (ignore it), and so on... Is this idea correct? So I have written a sketch of the code at shuffle_after.cc which I also try to explain how the master will compute rate, but at least I want to make it work. I know that this discussion is getting long but if you have some free time can you take a look at it? Thanks, Kostas On Tue, Nov 7, 2017 at 9:34 AM, George Bosilca wrote: > If each process send a different amount of data, then the operation should > be an allgatherv. This also requires that you know the amount each process > will send, so you will need a allgather. Schematically the code should look > like the following: > > long bytes_send_count = endata.size * sizeof(long); // compute the amount > of data sent by this process > long* recv_counts = (long*)malloc(comm_size * sizeof(long)); // allocate > buffer to receive the amounts from all peers > int displs = (int*)malloc(comm_size * sizeof(int)); // allocate buffer to > compute the displacements for each peer > MPI_Allgather( &bytes_send_count, 1, MPI_LONG, recv_counts, 1, MPI_LONG, > comm); // exchange the amount of sent data > long total = 0; // we need a total amount of data to be received > for( int i = 0; i < comm_size; i++) { > displs[i] = total; // update the displacements > total += recv_counts[i]; // and the total count > } > char* recv_buf = (char*)malloc(total * sizeof(char)); // prepare buffer > for the allgatherv > MPI_Allgatherv( &(endata.data), endata.size*sizeof(char), > MPI_UNSIGNED_CHAR, recv_buf, recv_counts, displs, MPI_UNSIGNED_CHAR, comm); > > George. > > > > On Tue, Nov 7, 2017 at 4:23 AM, Konstantinos Konstantinidis < > kostas1...@gmail.com> wrote: > >> OK, I started implementing the above Allgather() idea without success >> (segmentation fault). So I will post the problematic lines hare: >> >> * comm.Allgather(&(endata.size), 1, MPI::UNSIGNED_LONG_LONG, >> &(endata_rcv.size), 1, MPI::UNSIGNED_LONG_LONG);* >> * endata_rcv.data = new unsigned char[endata_rcv.size*lineSize];* >> * comm.Allgather(&(endata.data), endata.size*lineSize, >> MPI::UNSIGNED_CHAR, &(endata_rcv.data), endata_rcv.size*lineSize, >> MPI::UNSIGNED_CHAR);* >> * delete [] endata.data;* >> >> The idea (as it was also for the broadcasts) is first to transmit the >> data size as an unsigned long long integer, so that the receivers will >> reserve the required memory for the actual data to be transmitted after >> that. To my understanding, the problem is that each broadcasted data, let >> D(s,G), as I explained in the previous email is not only different but also >> has different size (in general). That's because if I replace the 3rd line >> with >> >> * comm.Allgather(&(endata.data), 1, MPI::UNSIGNED_CHAR, >> &(endata_rcv.data), 1, MPI::UNSIGNED_CHAR);* >> >> seems to work without seg. fault but this is pointless for me since I >> don't want only 1 char to be transmitted. So if we see the previous image I >> posted, imagine that the red, green and blue squares are different in size? >> Can Allgather() even work then? If no, do you suggest anything else or I am >> trapped in using the MPI_Bcast() as shown in Option 1? >> >> On Mon, Nov 6, 2017 at 8:58 AM, George Bosilca >> wrote: >> >>> On Sun, Nov 5, 2017 at 10:23 PM, Konstantinos Konstantinidis < >>> kostas1...@gmail.com> wrote: >>> Hi George, First, let me note that the cost of q^(k-1)]*(q-1) communicators was fine for the values of parameters q,k I am working with. Also, the whole point of speeding up the shuffling ph
Re: [OMPI users] OpenMPI 1.10.x handling of simultaneous MPI_Abort calls
Hi, On Tue, Nov 07, 2017 at 02:05:20PM -0700, Nikolas Antolin wrote: > Hello, > > In debugging a test of an application, I recently came across odd behavior > for simultaneous MPI_Abort calls. Namely, while the MPI_Abort was > acknowledged by the process output, the mpirun process failed to exit. I > was able to duplicate this behavior on multiple machines with OpenMPI > versions 1.10.2, 1.10.5, and 1.10.6 with the following simple program: > > #include > #include > #include > #include > > int main(int argc, char **argv) > { > int rank; > > MPI_Init(&argc,&argv); > MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > printf("I am process number %d\n", rank); > MPI_Abort(MPI_COMM_WORLD, 3); > return 0; > } > > Is this a bug or a feature? Does this behavior exist in OpenMPI versions > 2.0 and 3.0? I compiled your test case on CentOS-7 with openmpi 1.10.7/2.1.2 and 3.0.0 and the program seems to run fine. [tru@borma openmpi-test-abort]$ for i in 1.10.7 2.1.2 3.0.0; do module purge && module add openmpi/$i && mpicc aa.c -o aa-$i && ldd aa-$i; mpirun -n 2 ./aa-$i ; done linux-vdso.so.1 => (0x7ffe115bd000) libmpi.so.12 => /c7/shared/openmpi/1.10.7/lib/libmpi.so.12 (0x7f40d7b4a000) libpthread.so.0 => /lib64/libpthread.so.0 (0x7f40d78f7000) libc.so.6 => /lib64/libc.so.6 (0x7f40d7534000) libopen-rte.so.12 => /c7/shared/openmpi/1.10.7/lib/libopen-rte.so.12 (0x7f40d72b8000) libopen-pal.so.13 => /c7/shared/openmpi/1.10.7/lib/libopen-pal.so.13 (0x7f40d6fd9000) libnuma.so.1 => /lib64/libnuma.so.1 (0x7f40d6dcd000) libdl.so.2 => /lib64/libdl.so.2 (0x7f40d6bc9000) librt.so.1 => /lib64/librt.so.1 (0x7f40d69c) libm.so.6 => /lib64/libm.so.6 (0x7f40d66be000) libutil.so.1 => /lib64/libutil.so.1 (0x7f40d64bb000) /lib64/ld-linux-x86-64.so.2 (0x55f6d96c4000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7f40d62a4000) I am process number 1 I am process number 0 -- MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 3. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -- [borma.bis.pasteur.fr:08511] 1 more process has sent help message help-mpi-api.txt / mpi-abort [borma.bis.pasteur.fr:08511] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages linux-vdso.so.1 => (0x7fffaabcd000) libmpi.so.20 => /c7/shared/openmpi/2.1.2/lib/libmpi.so.20 (0x7f5bcee39000) libpthread.so.0 => /lib64/libpthread.so.0 (0x7f5bcebe6000) libc.so.6 => /lib64/libc.so.6 (0x7f5bce823000) libopen-rte.so.20 => /c7/shared/openmpi/2.1.2/lib/libopen-rte.so.20 (0x7f5bce5a) libopen-pal.so.20 => /c7/shared/openmpi/2.1.2/lib/libopen-pal.so.20 (0x7f5bce2a7000) libdl.so.2 => /lib64/libdl.so.2 (0x7f5bce0a3000) libnuma.so.1 => /lib64/libnuma.so.1 (0x7f5bcde97000) libudev.so.1 => /lib64/libudev.so.1 (0x7f5bcde81000) librt.so.1 => /lib64/librt.so.1 (0x7f5bcdc79000) libm.so.6 => /lib64/libm.so.6 (0x7f5bcd977000) libutil.so.1 => /lib64/libutil.so.1 (0x7f5bcd773000) /lib64/ld-linux-x86-64.so.2 (0x55718df01000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7f5bcd55d000) libcap.so.2 => /lib64/libcap.so.2 (0x7f5bcd357000) libdw.so.1 => /lib64/libdw.so.1 (0x7f5bcd11) libattr.so.1 => /lib64/libattr.so.1 (0x7f5bccf0b000) libelf.so.1 => /lib64/libelf.so.1 (0x7f5bcccf2000) libz.so.1 => /lib64/libz.so.1 (0x7f5bccadc000) liblzma.so.5 => /lib64/liblzma.so.5 (0x7f5bcc8b6000) libbz2.so.1 => /lib64/libbz2.so.1 (0x7f5bcc6a5000) I am process number 1 I am process number 0 -- MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 3. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -- [borma.bis.pasteur.fr:08534] 1 more process has sent help message help-mpi-api.txt / mpi-abort [borma.bis.pasteur.fr:08534] Set MCA parameter "orte_base_help_aggregate" to 0 to see a
Re: [OMPI users] Parallel MPI broadcasts (parameterized)
On Tue, Nov 7, 2017 at 6:09 PM, Konstantinos Konstantinidis < kostas1...@gmail.com> wrote: > OK, I will try to explain a few more things about the shuffling and I have > attached only specific excerpts of the code to avoid confusion. I have > added many comments. > > First, let me note that this project is an implementation of the Terasort > benchmark with a master node which assigns jobs to the slaves and > communicates with them after each phase to get measurements. > > The file shuffle_before.cc shows how I am doing the shuffling up to now > and the shuffle_after.cc the progress I made so far switching to > Allgatherv(). > > I have also included the code that measures time and data size since it's > crucial for me to check if I have rate speedup. > > Some questions I have are: > 1. At shuffle_after.cc:61 why do we reserve *comm.Get_size() *entries for* > recv_counts* and not *comm.Get_size()-1 *? For example if I am rank k > what is the point of *recv_counts[k-1]*? I guess that rank k also > receives data from himself but we can ignore it, right? > No, you cant simply ignore it ;) allgather copies the same amount of data to all processes in the communicator ... including itself. If you want to argue about this reach out to the MPI standardization body ;) > > 2. My next concern is about the structure of the buffer *recv_buf[]*. The > documentation says that the data is stored there ordered. So I assume that > it's stored as segments of char* ordered by rank and the way to distinguish > them is to chop the whole data based on *recv_counts[]*. So let G = {g1, > g2, ..., gN} a group that exchanges data. Let's take slave g2: Then segment > *recv_buf[0 > until **recv_counts[0]-1**] *is what g2 received from g1, > *recv_buf[**recv_counts[0] > until **recv_counts[1]-1**] *is what g2 received from himself (ignore > it), and so on... Is this idea correct? > I don't know what documentation says "ordered", there is no such wording in the MPI standard. By carefully playing with the receive datatype I can do anything I want, including interleaving data from the different peers. But this is not what you are trying to do here. The placement in memory you describe is true if you use the displacement array as crafted in my example. The entry i in the displacement array specifies the displacement (relative to recvbuf) at which to place the incoming data from process i, so where you receive data has nothing to do with the amount you receive but with what you have in the displacement array. > > So I have written a sketch of the code at shuffle_after.cc which I also > try to explain how the master will compute rate, but at least I want to > make it work. > This code looks OK to me. I would however: 1. Remove the barriers on the workerComm. If the order of the communicators in the multicastGroupMap is identical on all processes (including communicators where they do not belong to) then the barriers are superfluous. However, if you try to protect your processes from starting the allgather collective to early, then you can replace the barrier on workerComm with one on mcComm. 2. The check "ns.find(rank) != ns.end()" should be equivalent to "mcComm == MPI_COMM_NULL" if I understand your code correctly. 3. This is an optimization. Remove all time exchanges outside the main loop. Instead of sending them one-by-one, keep them in an array and send the entire array once per CodedWorker::execShuffle, possible via an MPI_Allgatherv toward the master process in MPI_COMM_WORLD (in this case you can convert the "long long" into a double to facilitate the collective). George. > > I know that this discussion is getting long but if you have some free time > can you take a look at it? > > Thanks, > Kostas > > > On Tue, Nov 7, 2017 at 9:34 AM, George Bosilca > wrote: > >> If each process send a different amount of data, then the operation >> should be an allgatherv. This also requires that you know the amount each >> process will send, so you will need a allgather. Schematically the code >> should look like the following: >> >> long bytes_send_count = endata.size * sizeof(long); // compute the >> amount of data sent by this process >> long* recv_counts = (long*)malloc(comm_size * sizeof(long)); // allocate >> buffer to receive the amounts from all peers >> int displs = (int*)malloc(comm_size * sizeof(int)); // allocate buffer >> to compute the displacements for each peer >> MPI_Allgather( &bytes_send_count, 1, MPI_LONG, recv_counts, 1, MPI_LONG, >> comm); // exchange the amount of sent data >> long total = 0; // we need a total amount of data to be received >> for( int i = 0; i < comm_size; i++) { >> displs[i] = total; // update the displacements >> total += recv_counts[i]; // and the total count >> } >> char* recv_buf = (char*)malloc(total * sizeof(char)); // prepare buffer >> for the allgatherv >> MPI_Allgatherv( &(endata.data), endata.size*sizeof(char), >> MPI_UNSIGNED_CHAR, recv_buf, recv_counts, displ