Re: [O-MPI users] Fwd: Thoughts on an MPI ABI

Toon Knapen Tue, 15 Mar 2005 02:33:47 -0500

Jeff Squyres wrote:

Greetings. I loosely watched the MPI ABI discussions on the Beowulflist but refrained from commenting (I stopped checking -- is it stillgoing on?). Now that the discussion has come to my project's list, Iguess I should speak up. :)
Since I've been "saving up" for a while, this post is a bit lengthy. Iapologize.



Thanks for the effort that you've put into this reply

First, let me ask a question: what does an MPI ABI *really* get for you?
The obvious answer is that you don't have to recompile. Your app runsanywhere with any MPI on any system. Well, that is, unless want to runon a different architecture (32/64 bit, different CPU, differentplatform, etc.). Or if you want to use a different compiler on the samesystem (let's not forget C++ and F90 name mangling issues). Or if youwant to use different system or compiler flags (e.g., threading / nothreading, largefile support on Linux, optimization and debuggingsupport, etc.).
So -- hmm. You can run your MPI app on any MPI implementation that ison exactly the same platform, architecture, uses the same compilers, anduses the same system and compiler flags that you want. So an MPI ABIdoes not enable the "compile once, run anywhere" scheme -- it really ismuch narrower than the casual observer might expect.



Then how do you explain the effort that went into the C++ ABI ?

What about the ISV?
Again, on the surface, this looks great -- an ISV can ship *one*executable and have it work "anywhere". Er, well, anywhere "similar"(so let's not forget that the ISV will still end up shipping a lot ofexecutables -- they may be shipping *fewer* executables than before, butthere will still be [far?] more than one).
But does an ISV really want that? Suddenly their app can [potentially]run in a lot of scenarios that they have not verified through their Q&Aprocess. How do you know that you'll get the right answers? How do youknow it won't crap out in the middle of the run because of a missingsymbol (not involving MPI)? The fact is that the app can now run in alot of unsupported places, whereas today, the possibilities of thishappening are *much* more limited. ISVs generally choose which MPIimplementations, but then their apps *only* run on those implementations(there are exceptions to this rule, I know).



This all depends on how details one specifies a platform and very few
ISV's specify every little detail.

For instance we specify for the linux platform which glibc we support.
We support one specific version _and_ all higher version. We do this
because we rely on the backward compatibility of glibc. The drawback is
that if the backward comptability in glibc is broken it will probably
show up in our code and our clients will contact us about it.
Additionally we will have to spend time on finding the error to

eventually find out that it is a backward compatibility problem ofglibc. However we can not afford to tell our customers to use onespecific version of glibc or test a whole range of glibc versions so wehave to take our chances and rely on the software our software relies on.

Big companies however have to power to specify their platforms in everylittle detail (this includes not only the os and the version of the osbut also the version of every module in the os) but 95% of the ISV's donot have this power ;-(

This is quite an important point, and is something that several othershave brought up in other mails: all MPI implementations are not createdequal. Take any two production-quality MPI implementations and they'llhave their own quirks and differences. They'll behave and performdifferently. So even though your application is source code portable,it may not be performance / behavioral portable. This has been awell-known fact for years (as someone said -- it's an artifact of usinga standard with multiple implementations). This is why ISV's Q&A testtheir applications with different MPI implementations, and only certifyspecific ones. More specifically, if your application works on one MPIimplementation, you can't guarantee that it will work on another. It*probably* will, but customers don't pay for "probably" (e.g., you can'tknow if you're accidentally relying on a quirk of one [or more!]implementation[s] without testing on exactly the ones that you plan tosupport).

If our client has an cluster with infiniband and we do not we will tryto make an executable for him. However we can not test this executableourselves but because it runs on 5 different platforms we _suppose_ thatif the MPI implementation is correct, our app will run correctly on thatswitch too. If the customer nevertheless has a problem, we log inremotely or go on-site to evaluate the problem. However we can notafford to say 'buy another switch' because this will mean that ourcustomer will go somewhere else and thus we loose the customer.

Again for comparison, we neither specify which BIOS-es we support orwhich brands of ethernet cards because we suppose they all work asexpected. Without any such assumptions, you just have to ship hardwaretogether with your software.



toon

Re: [O-MPI users] Fwd: Thoughts on an MPI ABI

Reply via email to