Just curious: I thought ULFM dealt with recovering an MPI job where one or more 
processes fail. Is this correct?

HLA/RTI consists of processes that start at random times, run to completion, 
and then exit normally. While a failure could occur, most process terminations 
are normal and there is no need/intent to revive them. So it's mostly a case of 
massively exercising MPI's dynamic connect/accept/disconnect functions.

Do ULFM's structures have some utility for that purpose?


On Apr 16, 2013, at 3:20 AM, George Bosilca <bosi...@icl.utk.edu> wrote:

> There is an ongoing effort to address the potential volatility of processes 
> in MPI called ULFM. There is a working version available at 
> http://fault-tolerance.org. It supports TCP, sm and IB (mostly). You will 
> find some examples, and the document explaining the additional constructs 
> needed in MPI to achieve this.
> 
>   George.
> 
> On Apr 15, 2013, at 17:29 , John Chludzinski <john.chludzin...@gmail.com> 
> wrote:
> 
>> That would seem to preclude its use for an RTI.  Unless you have a card up 
>> your sleeve?
>>  
>> ---John
>> 
>> 
>> On Mon, Apr 15, 2013 at 11:23 AM, Ralph Castain <r...@open-mpi.org> wrote:
>> It isn't the fact that there are multiple programs being used - we support 
>> that just fine. The problem with HLA/RTI is that it allows programs to 
>> come/go at will - i.e., not every program has to start at the same time, nor 
>> complete at the same time. MPI requires that all programs be executing at 
>> the beginning, and that all call finalize prior to anyone exiting.
>> 
>> 
>> On Apr 15, 2013, at 8:14 AM, John Chludzinski <john.chludzin...@gmail.com> 
>> wrote:
>> 
>>> I just received an e-mail notifying me that MPI-2 supports MPMD.  This 
>>> would seen to be just what the doctor ordered?
>>>  
>>> ---John
>>> 
>>> 
>>> On Mon, Apr 15, 2013 at 11:10 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>> FWIW: some of us are working on a variant of MPI that would indeed support 
>>> what you describe - it would support send/recv (i.e., MPI-1), but not 
>>> collectives, and so would allow communication between arbitrary programs.
>>> 
>>> Not specifically targeting HLA/RTI, though I suppose a wrapper that 
>>> conformed to that standard could be created.
>>> 
>>> On Apr 15, 2013, at 7:50 AM, John Chludzinski <john.chludzin...@gmail.com> 
>>> wrote:
>>> 
>>> > This would be a departure from the SPMD paradigm that seems central to
>>> > MPI's design. Each process would be a completely different program
>>> > (piece of code) and I'm not sure how well that would working using
>>> > MPI?
>>> >
>>> > BTW, MPI is commonly used in the parallel discrete even world for
>>> > communication between LPs (federates in HLA). But these LPs are
>>> > usually the same program.
>>> >
>>> > ---John
>>> >
>>> > On Mon, Apr 15, 2013 at 10:22 AM, John Chludzinski
>>> > <john.chludzin...@gmail.com> wrote:
>>> >> Is anyone aware of an MPI based HLA/RTI (DoD High Level Architecture
>>> >> (HLA) / Runtime Infrastructure)?
>>> >>
>>> >> ---John
>>> > _______________________________________________
>>> > users mailing list
>>> > us...@open-mpi.org
>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to