George is right -- you *can* do this, but it is *not advised* (you'll
likely run out of memory or other resources pretty quickly -- if you
can run at all!). :-)
Try mpi_leave_pinned, and check out those FAQ sections that I sent,
particularly the OpenFabrics section, for how to specifically tune
various behaviors of the openib BTL.
On Nov 6, 2008, at 1:52 PM, George Bosilca wrote:
In order to get good performance out of your test application, the
whole message has to be send in just one fragment. The reason is
that as long as there is no progress thread for the MPI library
(internal to the library), there is no way to make progress.
Now, I can explain how to do this, but trust me this is an ugly
hack, that make your application MPI implementation specific, i.e.
not portable in terms of performance. But, I guess this decision is
up to you. The really bad thing that might happens, is that in the
case the receiver is slower that the sender, you will buffer all
this eager message or messages in the receiver memory (what a
waste), you will use a lot more memory copies and you give up the
possibility to use the RMA features available on your network. So
yes, your specific code will maybe/eventually runs faster, but the
price to pay is way to expensive [from my perspective].
Here is how you can do this: Based on the network you use (open ib
in this case), the parameter selecting the first fragment size is
called *_eager_limit. Do a "ompi_info --param btl openib", grep for
eager_limit to figure out the name of the argument, and set it using
"--mca <name> value" to the value that you want. As an example, I
think this will work for openib: "--mca btl_openib_eager_limit
8388648" (8388608 + 40 for internal headers).
george.
On Nov 6, 2008, at 12:52 PM, Eugene Loh wrote:
vladimir marjanovic wrote:
In order to overlap communication and computation I don't want to
use MPI_Wait.
Right. One thing to keep in mind is that there are two ways of
overlapping communication and computation. One is you start a send
(MPI_Isend), you do a bunch of computation while the message is
being sent, and then after the message has been sent you call
MPI_Wait just to clean up. This assumes that the MPI
implementation can send a message while control of the program has
been returned to you. The experts can give you the fine print, but
my simple assertion is, "This doesn't usually happen."
Rather, the MPI implementation typically will send data only when
your code is in some MPI call. That's why you have to call
MPI_Test periodically... or some other MPI function.
For sure the message is being decomposed into chucks and the size
of chuck is probably defined by environment variable.
Maybe do you know how can I control size of chuck?
I don't. Try running "ompi_info -a" and looking through the
parameters. For the shared-memory BTL, it's
mca_btl_sm_max_frag_size. I also see something like
coll_sm_fragment_size. Maybe look at the parameters that have
"btl_openib_max" in their names.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems