Adding Craig Rasmussen from LANL into the CC list...

On Oct 31, 2006, at 10:26 AM, Michael Kluskens wrote:

OpenMPI tickets 39 & 55 deal with problems with the Fortran 90 large interface with regards to:

#39: MPI_IN_PLACE in MPI_REDUCE <https://svn.open-mpi.org/trac/ompi/ ticket/39> #55: MPI_GATHER with arrays of different dimensions <https:// svn.open-mpi.org/trac/ompi/ticket/55>

Attached is a patch to deal with these two issues as applied against OpenMPI-1.3a1r12364.

Thanks for the patch! Before committing this, though, I think more needs to be done and I want to understand it before doing so (part of this is me thinking it out while I write this e-mail...). Also, be aware that SC is 1.5 weeks away, so I may not be able to get to address this issue before then (SC tends to be all-consuming).

1. The "same type" heuristic for the "large" F90 module was not intended to cover all possible scenarios. You're absolutely right that assuming the same time makes no sense for some of the interfaces. The problem is that the obvious alternative (all possible scenarios) creates an exponential number of interfaces (in the millions). So "large" was an attempt to provide *some* of the interfaces -- but [your] experience has shown that this can do more harm than good (i.e., make some legal MPI applications uncompilable because we provide *some* interfaces to MPI_GATHER, but not all).

1a. It gets worse because of MPI's semantics for MPI_GATHER. You pointed out one scenario -- it doesn't make sense to supply "integer" for both the sendbuf and recvbuf because the root will need an integer array to receive all the values (similar logic applies to MPI_SCATTER and other collectives -- so what you did for MPI_GATHER would need to be applied to several others as well).

1b. But even worse than that is the fact that, for MPI_GATHER, the receive buffer is not relevant on non-root processes. So it's valid for *any* type to be passed for non-root processes (leading to the exponential interface explosion described above).

So having *some* interfaces for MPI_GATHER can be a problem for both 1a and 1b -- perfectly valid/legal MPI apps will fail to compile.

I'm not sure what the right balance is here -- how do we allow for both 1a and 1b without creating millions of interfaces? Your patch created MPI_GATHER interfaces for all the same types, but allowing any dimension mix. With the default max dimension level of 4 in OMPI's interfaces, this created 90 new interfaces for MPI_GATHER, calculated (and verified with some grep/wc'ing):

For src buffer of dimension:    0   1   2   3   4
Create this many recvbuf types: 4 + 4 + 3 + 2 + 1 = 14

For each src/recvbuf combination, create this many interfaces:

(char + logical + (integer * 4) + (real * 2) + (complex * 2)) = 10

Where 4, 2, and 2 are the number of integer, real, and complex types supported by the compiler on my machines (e.g., gfortran on OSX/intel and Linux/EM64T).

So this created 14 * 10 = 140 interfaces, as opposed to the 50 that were there before the patch (5 dimensions of src/recvbuf * 10 types = 50), resulting in 90 new interfaces.

This effort will need to be duplicated by several other collectives:

- allgather, allgatherv
- alltoall, alltoallv, alltoallw
- gather, gatherv
- scatter, scatterv

So an increase of 9 * 90 = 810 new interfaces. Not too bad, considering the alternative (exponential). But consider that the "large" interface only has (by my count via egrep/wc) 4013 interfaces. This would be increasing its size by about 20%. This is certainly not a show-stopper, but something to consider.

Note that if you go higher than OMPI's default 4 dimensions, the number of new interfaces gets considerably larger (e.g., for 7 dimensions you get 35 send/recv type combinations instead of 14, so (35 * 10 * 9) = 3150 total interfaces (just for the collectives), if I did my math right.

2. You also identified another scenario that needs to be fixed -- support for MPI_IN_PLACE in certain collectives (MPI_REDUCE is not the only collective that supports it). It doesn't seem to be a Good Idea to allow the INTEGER type to be mixed with any other type for send/recvbuf combinations, just to allow MPI_IN_PLACE. This potentially adds in send/recvbuf signatures that we want to disallow (even though they are potentially valid MPI applications!) -- e.g., INTEGER and FLOAT. What if a user accidentally supplied an INTEGER for the sendbuf that wasn't MPI_IN_PLACE? That's what the type system is supposed to be preventing.

I don't know enough about the type system of F90, but it strikes me that we should be able to create a unique type for MPI_IN_PLACE (don't know why I didn't think of this before for some of the MPI sentinel values... :-\ ) and therefore have a safe mechanism for this sentinel value.

This would add 10 interfaces for every function that supports MPI_IN_PLACE; a pretty small increase.

This same technique should probably be applied to some of the other sentinel values, such as MPI_ARGVS_NULL and MPI_STATUSES_IGNORE.

---------------

All that being said, what does it mean?

I think #2 is easily enough fixed (just require the time to do so), and has minimal impact on the number of interfaces. Implementing MPI sentinel values with unique types also makes user apps that much more safe (i.e., they won't accidentally pass in an incorrect type that would be mistaken -- by the interface -- for a valid signature).

#1 is still a problem. No matter how we slice it, we're going to leave out valid combinations of send/recv buffers that will prevent potentially legal MPI applications from compiling. This is as opposed to not having F90 interfaces for the 2-choice-buffer functions at all, which would mean that F90 apps using MPI_GATHER (for example) would simply fall back to the F77 interfaces where no type checking is done. End result: all MPI F90 apps can compile.

Simply put, with the trivial, small, and medium module sizes, all valid MPI F90 applications can compile and run. With the large size, unless we do the exponential interface explosion, we will be potentially excluding some legal MPI F90 applications -- they *will not be able to compile* (without workarounds). This is what I meant by ticket 55's title "F90 "large" interface may not entirely make sense".

So there are multiple options here:

1. Keep chasing a "good" definition of "large" such that most/all current MPI F90 apps can compile. The problem is that this target can change over time, and keep requiring maintenance.

2. Stop pursuing "large" because of the problems mentioned above. This has the potential problem of not providing type safety to F90 MPI apps for the MPI collective interfaces, but at least all apps can compile, and there's only a small number of 2-choice-buffer functions that do not get the type safety from F90 (i.e., several MPI collective functions).

3. Start implementing the proposed F03 MPI interfaces that don't have the same problems as the F90 MPI interfaces.

I have to admit that I'm leaning more towards #2 (and I wish that someone who has the time would do #3!) and discarding #1...

Comments?

--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

Reply via email to