Re: [OMPI users] Progress of the asynchronous messages

Jeff Squyres Thu, 6 Nov 2008 14:00:13 -0500

George is right -- you *can* do this, but it is *not advised* (you'lllikely run out of memory or other resources pretty quickly -- if youcan run at all!). :-)

Try mpi_leave_pinned, and check out those FAQ sections that I sent,particularly the OpenFabrics section, for how to specifically tunevarious behaviors of the openib BTL.



On Nov 6, 2008, at 1:52 PM, George Bosilca wrote:

In order to get good performance out of your test application, thewhole message has to be send in just one fragment. The reason isthat as long as there is no progress thread for the MPI library(internal to the library), there is no way to make progress.
Now, I can explain how to do this, but trust me this is an uglyhack, that make your application MPI implementation specific, i.e.not portable in terms of performance. But, I guess this decision isup to you. The really bad thing that might happens, is that in thecase the receiver is slower that the sender, you will buffer allthis eager message or messages in the receiver memory (what awaste), you will use a lot more memory copies and you give up thepossibility to use the RMA features available on your network. Soyes, your specific code will maybe/eventually runs faster, but theprice to pay is way to expensive [from my perspective].
Here is how you can do this: Based on the network you use (open ibin this case), the parameter selecting the first fragment size iscalled *_eager_limit. Do a "ompi_info --param btl openib", grep foreager_limit to figure out the name of the argument, and set it using"--mca <name> value" to the value that you want. As an example, Ithink this will work for openib: "--mca btl_openib_eager_limit8388648" (8388608 + 40 for internal headers).
 george.

On Nov 6, 2008, at 12:52 PM, Eugene Loh wrote:
vladimir marjanovic wrote:
In order to overlap communication and computation I don't want touse MPI_Wait.
Right. One thing to keep in mind is that there are two ways ofoverlapping communication and computation. One is you start a send(MPI_Isend), you do a bunch of computation while the message isbeing sent, and then after the message has been sent you callMPI_Wait just to clean up. This assumes that the MPIimplementation can send a message while control of the program hasbeen returned to you. The experts can give you the fine print, butmy simple assertion is, "This doesn't usually happen."
Rather, the MPI implementation typically will send data only whenyour code is in some MPI call. That's why you have to callMPI_Test periodically... or some other MPI function.
For sure the message is being decomposed into chucks and the sizeof chuck is probably defined by environment variable.
Maybe do you know how can I control size of chuck?
I don't. Try running "ompi_info -a" and looking through theparameters. For the shared-memory BTL, it'smca_btl_sm_max_frag_size. I also see something likecoll_sm_fragment_size. Maybe look at the parameters that have"btl_openib_max" in their names.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems

Re: [OMPI users] Progress of the asynchronous messages

Reply via email to