What I missed in this whole conversation is that the pieces of text
that Ron and Dick are citing are *on the same page* in the MPI spec;
they're not disparate parts of the spec that accidentally overlap in
discussion scope.
Specifically, it says:
Resource limitations
Any pending communication operation consumes system resources that
are
limited. Errors may occur when lack of resources prevent the
execution
of an MPI call. A quality implementation will use a (small) fixed
amount
of resources for each pending send in the ready or synchronous
mode and
for each pending receive. However, buffer space may be consumed to
store
messages sent in standard mode, and must be consumed to store
messages
sent in buffered mode, when no matching receive is available. The
amount
of space available for buffering will be much smaller than program
data
memory on many systems. Then, it will be easy to write programs that
overrun available buffer space.
...12 lines down on that page, on the same page, in the same section...
Consider a situation where a producer repeatedly produces new values
and sends them to a consumer. Assume that the producer produces new
values faster than the consumer can consume them.
...skip 2 sentences about buffered sends...
If standard sends are used, then the producer will be automatically
throttled, as its send operations will block when buffer space is
unavailable.
I find that to be unambiguous.
1. A loop of MPI_ISENDs on a producer can cause a malloc failure
(can't malloc a new MPI_Request), and that's an error. Tough luck.
2. A loop of MPI_SENDs on a producer can run a slow-but-MPI-active
consumer out of buffer space if all the incoming messages are queued
up (e.g., in the unexpected queue). The language above is pretty
clear about this: MPI_SEND on the producer is supposed to block at
this point.
FWIW: Open MPI does support this mode of operation, as George and Gleb
noted (by setting the eager size to 0, therefore forcing *all* sends
to be synchronous -- a producer cannot "run ahead" for a while and
eventually be throttled when receive buffering is exhausted), but a)
it's not the default, and b) it's not documented this way.
On Feb 4, 2008, at 1:29 PM, Richard Treumann wrote:
Hi Ron -
I am well aware of the scaling problems related to the standard send
requirements in MPI. I t is a very difficult issue.
However, here is what the standard says: MPI 1.2, page 32 lines 29-37
=======
a standard send operation that cannot complete because of lack of
buffer space will merely block, waiting for buffer space to become
available or for a matching receive to be posted. This behavior is
preferable in many situations. Consider a situation where a producer
repeatedly produces new values and sends them to a consumer. Assume
that the producer produces new values faster than the consumer can
consume them. If buffered sends are used, then a buffer overflow
will result. Additional synchronization has to be added to the
program so as to prevent this from occurring. If standard sends are
used, then the producer will be
automatically throttled, as its send operations will block when
buffer space is unavailable.
========
If there are people who want to argue that this is unclear or that
it should be changed, the MPI Forum can and should take up the
discussion. I think this particular wording is pretty clear.
The piece of MPI standard wording you quote is somewhat ambiguous:
============
The amount
of space available for buffering will be much smaller than program
data
memory on many systems. Then, it will be easy to write programs that
overrun available buffer space.
============
But note that this wording mentions a problem that an application
can create but does not say the MPI implementation can fail the job.
The language I have pointed to is where the standard says what the
MPI implementation must do.
The "lack of resource" statement is more about send and receive
descriptors than buffer space. If I write a program with an infinite
loop of MPI_IRECV postings the standard allows that to fail.
Dick
Dick Treumann - MPI Team/TCEM
IBM Systems & Technology Group
Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363
users-boun...@open-mpi.org wrote on 02/04/2008 12:24:11 PM:
>
> > Is what George says accurate? If so, it sounds to me like OpenMPI
> > does not comply with the MPI standard on the behavior of eager
> > protocol. MPICH is getting dinged in this discussion because they
> > have complied with the requirements of the MPI standard. IBM MPI
> > also complies with the standard.
> >
> > If there is any debate about whether the MPI standard does (or
> > should) require the behavior I describe below then we should move
> > the discussion to the MPI 2.1 Forum and get a clarification.
> > [...]
>
> The MPI Standard also says the following about resource limitations:
>
> Any pending communication operation consumes system resources
that are
> limited. Errors may occur when lack of resources prevent the
execution
> of an MPI call. A quality implementation will use a (small)
fixed amount
> of resources for each pending send in the ready or synchronous
mode and
> for each pending receive. However, buffer space may be consumed
to store
> messages sent in standard mode, and must be consumed to store
messages
> sent in buffered mode, when no matching receive is available.
The amount
> of space available for buffering will be much smaller than
program data
> memory on many systems. Then, it will be easy to write programs
that
> overrun available buffer space.
>
> Since I work on MPI implementations that are expected to allow
applications
> to scale to tens of thousands of processes, I don't want the
overhead of
> a user-level flow control protocol that penalizes scalable
applications in
> favor of non-scalable ones. I also don't want an MPI
implementation that
> hides such non-scalable application behavior, but rather exposes
it at lower
> processor counts -- preferably in a way that makes the application
developer
> aware of the resources requirements of their code and allows them
to make
> the appropriate choice regarding the structure of their code, the
underlying
> protocols, and the amount of buffer resources.
>
> But I work in a place where codes are expected to scale and not
just work.
> Most of the vendors aren't allowed to have this perspective....
>
> -Ron
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems