Dear All, In my code I implement mpi_send/mpi_receive for an three dimensional real array, and process is as follows:
all other processors send the array to rank 0 and then rank 0 receives the array and put these arrays into a complete array. Then mpi_bcast is called to send the complete array from rank 0 to all others. This is very basic usage of mpi_send and mpi_receive. In my fortran code I found that if I added call mpi_barrier(...) before the mpi_send and mpi_reive statements the wall time (60s) for this sending and receiving will be much lower than that if mpi_barrier is not called (2s). I used mpi_wtime to count the time. I think mpi_send and mpi_recv are blocking subroutines and thus no additional mpi_barrier is needed. Can anybody tell me what is the reason for this phenomena? Thank you very much. best regards, Huangwei