Hi all. (sorry for duplication, if it is) I have to parallelize a CFD code using domain/grid/mesh partitioning among the processes. Before running, we do not know, (i) How many processes we will use ( np is unknown) (ii) A process will have how many neighbouring processes (my_nbrs = ?) (iii) How many entries a process need to send to a particular neighbouring process. But when the code run, I calculate all of this info easily.
The problem is to copy a number of entries to an array then send that array to a destination process. The same sender has to repeat this work to send data to all of its neighbouring processes. Is this following code fine: DO i = 1, my_nbrs DO j = 1, few_entries_for_this_neighbour send_array(j) = my_array(jth_particular_entry) ENDDO CALL MPI_ISEND(send_array(1:j),j, MPI_REAL8, dest(i), tag, MPI_COMM_WORLD, request1(i), ierr) ENDDO And the corresponding receives, at each process: DO i = 1, my_nbrs k = few_entries_from_this_neighbour CALL MPI_IRECV(recv_array(1:k),k, MPI_REAL8, source(i), tag, MPI_COMM_WORLD, request2(i), ierr) DO j = 1, few_from_source(i) received_data(j) = recv_array(j) ENDDO ENDDO After the above MPI_WAITALL. I think this code will not work. Both for sending and receiving. For the non-blocking sends we cannot use send_array to send data to other processes like above (as we are not sure for the availability of application buffer for reuse). Am I right? Similar problem is with recv array; data from multiple processes cannot be received in the same array like above. Am I right? Target is to hide communication behind computation. So need non blocking communication. As we do know value of np or values of my_nbrs for each process, we cannot decide to create so many arrays. Please suggest solution. =================== A more subtle solution that I could assume is following: cc = 0 DO i = 1, my_nbrs DO j = 1, few_entries_for_this_neighbour send_array(cc+j) = my_array(jth_particular_entry) ENDDO CALL MPI_ISEND(send_array(cc:cc+j),j, MPI_REAL8, dest(i), tag, MPI_COMM_WORLD, request1(i), ierr) cc = cc + j ENDDO And the corresponding receives, at each process: cc = 0 DO i = 1, my_nbrs k = few_entries_from_this_neighbour CALL MPI_IRECV(recv_array(cc+1:cc+k),k, MPI_REAL8, source(i), tag, MPI_COMM_WORLD, request2(i), ierr) DO j = 1, k received_data(j) = recv_array(cc+j) ENDDO cc = cc + k ENDDO After the above MPI_WAITALL. Means that, send_array for all neighbours will have a collected shape: send_array = [... entries for nbr 1 ..., ... entries for nbr 1 ..., ..., ... entries for last nbr ...] And the respective entries will be send to respective neighbours as above. recv_array for all neighbours will have a collected shape: recv_array = [... entries from nbr 1 ..., ... entries from nbr 1 ..., ..., ... entries from last nbr ...] And the entries from the processes will be received at respective locations/portion in the recv_array. Is this scheme is quite fine and correct. I am in search of efficient one. Request for help. With best regards, Amjad Ali.