Re: [OMPI users] MPI and DRMAA

Ralph Castain Wed, 10 Mar 2010 22:52:21 -0500

Since I wrote and/or support most of the OMPI DRM interface code at one time or 
another, guess I'll add my $0.02 here. :-)

There is no simple, nor obvious "winning", answer here. There really aren't all 
that many DRMs out there when you filter the list according to the number of 
places that use them. Once you do, you find that only a very few see enough 
usage to merit a lot of support. We chose to provide support for a broader set 
of DRMs solely because (a) it wasn't all that hard to do so, and (b) we wanted 
to make OMPI available to as wide an audience as we could.

Launching MPI jobs directly from the scheduler is not only possible, but 
available today with most (if not all) the MPI implementations. Not all DRMs 
support it, but some do. To understand why some choose not to support that 
mode, you have to understand that startup of an MPI job consists of two very 
distinct phases:

1. mapping of the process to the allocated nodes (defining what ranks go 
where), and subsequent spawning of those procs; and

2. wireup of the MPI interconnects across the processes.

All of the DRMs/schedulers can do step 1. Doing the second step in a scalable, 
fast way is non-trivial. Some vendors provide specific interconnects that are 
tightly coupled to the DRM such that step 2 can be done without exchanging 
messages to pass contact info - but that introduces some constraints on 
portability of the DRM itself, requires development of specialized 
interconnects for a limited market, etc.

Other DRMs provide software support for step 2 - with an attendant required 
investment for development and maintenance. You are correct in that it raises a 
question of return on investment, but I don't find that many DRM "vendors" are 
motivated by such things. Instead, they appear to be motivated primarily by ego 
("we can build it better than anyone else") and competition (in many cases, the 
DRM is developed under a funding grant that continues so long as the developing 
organization can win grants). There is, therefore, little motivation to 
standardize DRM interfaces or support.

So I very much doubt you'll see a consolidation of DRM interfaces any time soon.

Of course, the various DRMs do provide differing levels of support (e.g., fault 
tolerance). We at OMPI made the decision to expend the effort to try and 
provide an even user-level experience by filling any differences in DRM 
capability from within OMPI. So there is a -lot- of code within OMPI's RTE 
dedicated to providing capabilities found in one environment that might be 
missing in another. We do that in our modular architecture, though, so where a 
capability is available via the DRM, we exploit it - and where it isn't, we 
implement it ourselves.

Some DRM providers wonder at times as to why we do that - after all, if we only 
used what the DRM provided, our lives would be easier. But we believe the user 
would not benefit from that approach, and so we continue to make the effort. 
Our "reward" is that users can run an OMPI program on nearly every system we 
know about and have it behave exactly the same way (some setting of MCA params 
may be required).

Long winded answer - hope it provides some insight into the decisions we make.
Ralph

On Mar 10, 2010, at 7:03 PM, Brian Smith wrote:

> Hi, All,
> 
> This may seem like an odd query (or not; perhaps it has been brought up
> before).  My work recently involves HPC usability i.e. making things
> easier for new users by abstracting away the scheduler.  I've been
> working with DRMAA for interfacing with DRMs and it occurred to me: what
> would be the advantage to letting the scheduler itself handle farming
> out MPI processes as individual tasks rather than having a wrapper like
> mpirun to handle this task via ssh/rsh/etc.?
> 
> I thought about MPI2's ability to do dynamic process management and how
> scheduling environments tend to allocate static pools of resources for
> parallel tasks.  A DRMAA-driven MPI would be able to request that the
> scheduler launch these tasks as resources become available enabling
> scheduled MPI jobs to dynamically add and remove processors during
> execution.  Several applications that I have worked with come to mind,
> where pre-processing and other tasks are non-parallel whereas the
> various solvers are.  Being able to dynamically spawn processes based on
> where you are in this work-flow could be very useful here.
> 
> It also occurred to me that commercial application vendors tend to
> roll-their-own when it comes to integrating their applications with an
> MPI library.  I've seen applications use HP-MPI, MPICH, MPICH2,
> Intel-MPI, (and thankfully, recently) OpenMPI and then proceed to
> butcher the execution mechanisms to such an extent that it makes
> integration with common DRM systems quite a task.  With the exception of
> OpenMPI, none of these libraries provides turn-key compatibility with
> most of the major DRMs and each require some degree of manual
> integration and testing for use in a multi-user production environment.
> I would think that vendors would be falling over themselves to integrate
> OpenMPI with their applications for this very reason alone.  Instead,
> some opt to develop their own scheduling environments!  Don't they have
> bean counters that sit around and gripe about duplicated work?
> 
> Then it occurred to me: with the exception of being able to easily
> launch an MPI job with OpenMPI, the ability to monitor it from within
> the application is still dependent on the vendor integrating with
> various DRMs!  This is another area where a DRMAA RAS can come in handy.
> There are nice bindings for monitoring tasks and getting an idea of
> where you are in execution without having to resort to kludgey
> shell-script wrappers tailing output files.
> 
> Anyway, its been a frustrating couple of weeks dealing with several
> commercial vendors and integrating their applications with our DRM and
> my mind has been trying to think of a solution that could save all of us
> a lot of work (though, at the same time, raise job security concerns in
> such turbulent times ;-/ ).  What say you, MPI experts?
> 
> Many thanks for your thoughts!
> -Brian
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] MPI and DRMAA

Reply via email to