amjad ali wrote:
and it's conceivable that you
might have better performance with
CALL MPI_ISEND()
DO I = 1, N
call do_a_little_of_my_work() ! no MPI progress is being made
here
CALL MPI_TEST() ! enough MPI progress is being made
here that the receiver has something to do
END DO
CALL MPI_WAIT()
Whether performance improves or not is not guaranteed by the MPI
standard.
And the SECOND desire is to use Persistent communication
for even better speedup.
Right. That's a separate issue.
So actually I am focusing on the persistent communication at this time.
Based on your suggestions, I developed:
sending, receiving buffers, and the request array is defined in
declared in the global module. And their sizes are allocated in the
main program. But following is not working. Segmentation fault messages
at just from the underline blue line lace.
Well, the problem must be in the details of how you're implementing
this. I've attached a program that works for me.
Main program
starts------@@@@@@@@@@@@@@@@@@@@@@@.
CALL MPI_RECV_INIT for
each neighboring process
CALL MPI_SEND_INIT for each neighboring process
Loop Calling the
subroutine1--------------------(10000 times in the main program).
Call subroutine1
Subroutine1
starts===================================
Loop A starts here
>>>>>>>>>>>>>>>>>>>>
(three passes)
Call subroutine2
Subroutine2 starts----------------------------
Pick local data from
array U in separate arrays for each neighboring processor
CALL MPI_STARTALL
-------perform work that could
be done with local data
CALL MPI_WAITALL( )
-------perform work
using the received data
Subroutine2 ends----------------------------
-------perform work to update array U
Loop A ends here
>>>>>>>>>>>>>>>>>>>>
Subroutine1
ends====================================
Loop Calling the subroutine1
ends------------(10000 times in the main program).
CALL MPI_Request_free( )
Main program
ends------@@@@@@@@@@@@@@@@@@@@@@@.
How to tackle all this.
|
module my_mpi_stuff
integer, parameter :: nmsgs = 1, nwords = 8
integer me, np
integer reqs( nmsgs,2) ! (...,1) are for sends and (...,2) are
for receives
real(8) bufs(nwords,nmsgs,2) ! (...,1) are for sends and (...,2) are
for receives
end module my_mpi_stuff
program main
use my_mpi_stuff
include "mpif.h"
call MPI_Init(ier)
call MPI_Comm_size(MPI_COMM_WORLD,np,ier)
call MPI_Comm_rank(MPI_COMM_WORLD,me,ier)
! set up individual sends and receives
if ( np /= 2 ) stop "np is not 2" ! this simple example works only for
np==2
call MPI_Recv_init(bufs(1,1,2), nwords, MPI_REAL8, 1-me, 300, MPI_COMM_WORLD,
reqs(1,2), ier)
call MPI_Send_init(bufs(1,1,1), nwords, MPI_REAL8, 1-me, 300, MPI_COMM_WORLD,
reqs(1,1), ier)
do i = 1, 10000
call sub1()
end do
! dump out buffers
do imsg = 1, nmsgs
write(6,'(3i5,8f8.1)') me, imsg, 1, bufs(:,imsg,1)
write(6,'(3i5,8f8.1)') me, imsg, 2, bufs(:,imsg,2)
end do
do imsg = 1, nmsgs
call MPI_Request_free(reqs(imsg,1), ier)
call MPI_Request_free(reqs(imsg,2), ier)
end do
call MPI_Finalize(ier)
end program main
subroutine sub1()
do i = 1, 3
call sub2()
! call update(u)
end do
end subroutine sub1
subroutine sub2()
use my_mpi_stuff
include "mpif.h"
! Pick local data from array U in separate arrays for each neighboring
processor
do imsg = 1, nmsgs
do iword = 1, nwords
bufs(iword,imsg,1) = 10000 * me + 100 * imsg + iword
end do
end do
call MPI_Startall(2*nmsgs,reqs,ier)
! -------perform work that could be done with local data
call MPI_Waitall (2*nmsgs,reqs,MPI_STATUSES_IGNORE,ier)
! -------perform work using the received data
end subroutine sub2
#!/bin/csh
setenv OPAL_PREFIX .....
set path = ( $OPAL_PREFIX/bin $path )
mpif90 a.f90
mpirun -n 2 ./a.out