On Apr 19, 2013, at 8:00 AM, John Chludzinski <john.chludzin...@gmail.com> 
wrote:

> So the apparent conclusion to this thread is that an (Open)MPI based RTI is 
> very doable - if we allow for the future develoment of dynamic joining and 
> leaving of the MPI collective?

Yes - like Brian said, it was done once before, though we don't know the state 
of that code and it wasn't done with OMPI (which didn't exist at that time). 
Mostly a question of getting support for a "rolling start/stop" in place and 
writing the RTI wrapper.

The first is being worked - the second would require either motivation or 
(better yet) an interested/willing third party :-)

>  
> ---John
> 
> 
> On Wed, Apr 17, 2013 at 12:45 PM, Ralph Castain <r...@open-mpi.org> wrote:
> Thanks for the clarification - very interesting indeed! I'll look at it more 
> closely.
> 
> 
> On Apr 17, 2013, at 9:20 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
>> On Apr 16, 2013, at 15:51 , Ralph Castain <r...@open-mpi.org> wrote:
>> 
>>> Just curious: I thought ULFM dealt with recovering an MPI job where one or 
>>> more processes fail. Is this correct?
>> 
>> It depends what is the definition of "recovering" you take. ULFM is about 
>> leaving the processes that remains (after a fault or a disconnect) in a 
>> state that allow them to continue to make progress. It is not about 
>> recovering processes, or user data, but it does provide the minimalistic set 
>> of functionalities to allow application to do this, if needed (revoke, 
>> agreement and shrink).
>> 
>>> HLA/RTI consists of processes that start at random times, run to 
>>> completion, and then exit normally. While a failure could occur, most 
>>> process terminations are normal and there is no need/intent to revive them.
>> 
>> As I said above, there is no revival of processes in ULFM, and it was never 
>> our intent to have such feature. The dynamic world is to be constructed 
>> using MPI-2 constructs (MPI_Spawn or MPI_Connect/Accept or even MPI_Join).
>> 
>>> So it's mostly a case of massively exercising MPI's dynamic 
>>> connect/accept/disconnect functions.
>>> 
>>> Do ULFM's structures have some utility for that purpose?
>> 
>> Absolutely. If the process that leaves instead of calling MPI_Finalize calls 
>> exit() this will be interpreted by the version of the runtime in ULFM as an 
>> event triggering a report. All the ensuing mechanisms are then activated and 
>> the application can react to this event with the most meaningful approach it 
>> can envision.
>> 
>>   George.
>> 
>>> 
>>> 
>>> On Apr 16, 2013, at 3:20 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>> 
>>>> There is an ongoing effort to address the potential volatility of 
>>>> processes in MPI called ULFM. There is a working version available at 
>>>> http://fault-tolerance.org. It supports TCP, sm and IB (mostly). You will 
>>>> find some examples, and the document explaining the additional constructs 
>>>> needed in MPI to achieve this.
>>>> 
>>>>   George.
>>>> 
>>>> On Apr 15, 2013, at 17:29 , John Chludzinski <john.chludzin...@gmail.com> 
>>>> wrote:
>>>> 
>>>>> That would seem to preclude its use for an RTI.  Unless you have a card 
>>>>> up your sleeve?
>>>>>  
>>>>> ---John
>>>>> 
>>>>> 
>>>>> On Mon, Apr 15, 2013 at 11:23 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>> It isn't the fact that there are multiple programs being used - we 
>>>>> support that just fine. The problem with HLA/RTI is that it allows 
>>>>> programs to come/go at will - i.e., not every program has to start at the 
>>>>> same time, nor complete at the same time. MPI requires that all programs 
>>>>> be executing at the beginning, and that all call finalize prior to anyone 
>>>>> exiting.
>>>>> 
>>>>> 
>>>>> On Apr 15, 2013, at 8:14 AM, John Chludzinski 
>>>>> <john.chludzin...@gmail.com> wrote:
>>>>> 
>>>>>> I just received an e-mail notifying me that MPI-2 supports MPMD.  This 
>>>>>> would seen to be just what the doctor ordered?
>>>>>>  
>>>>>> ---John
>>>>>> 
>>>>>> 
>>>>>> On Mon, Apr 15, 2013 at 11:10 AM, Ralph Castain <r...@open-mpi.org> 
>>>>>> wrote:
>>>>>> FWIW: some of us are working on a variant of MPI that would indeed 
>>>>>> support what you describe - it would support send/recv (i.e., MPI-1), 
>>>>>> but not collectives, and so would allow communication between arbitrary 
>>>>>> programs.
>>>>>> 
>>>>>> Not specifically targeting HLA/RTI, though I suppose a wrapper that 
>>>>>> conformed to that standard could be created.
>>>>>> 
>>>>>> On Apr 15, 2013, at 7:50 AM, John Chludzinski 
>>>>>> <john.chludzin...@gmail.com> wrote:
>>>>>> 
>>>>>> > This would be a departure from the SPMD paradigm that seems central to
>>>>>> > MPI's design. Each process would be a completely different program
>>>>>> > (piece of code) and I'm not sure how well that would working using
>>>>>> > MPI?
>>>>>> >
>>>>>> > BTW, MPI is commonly used in the parallel discrete even world for
>>>>>> > communication between LPs (federates in HLA). But these LPs are
>>>>>> > usually the same program.
>>>>>> >
>>>>>> > ---John
>>>>>> >
>>>>>> > On Mon, Apr 15, 2013 at 10:22 AM, John Chludzinski
>>>>>> > <john.chludzin...@gmail.com> wrote:
>>>>>> >> Is anyone aware of an MPI based HLA/RTI (DoD High Level Architecture
>>>>>> >> (HLA) / Runtime Infrastructure)?
>>>>>> >>
>>>>>> >> ---John
>>>>>> > _______________________________________________
>>>>>> > users mailing list
>>>>>> > us...@open-mpi.org
>>>>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to