On Apr 19, 2013, at 8:00 AM, John Chludzinski <john.chludzin...@gmail.com> wrote:
> So the apparent conclusion to this thread is that an (Open)MPI based RTI is > very doable - if we allow for the future develoment of dynamic joining and > leaving of the MPI collective? Yes - like Brian said, it was done once before, though we don't know the state of that code and it wasn't done with OMPI (which didn't exist at that time). Mostly a question of getting support for a "rolling start/stop" in place and writing the RTI wrapper. The first is being worked - the second would require either motivation or (better yet) an interested/willing third party :-) > > ---John > > > On Wed, Apr 17, 2013 at 12:45 PM, Ralph Castain <r...@open-mpi.org> wrote: > Thanks for the clarification - very interesting indeed! I'll look at it more > closely. > > > On Apr 17, 2013, at 9:20 AM, George Bosilca <bosi...@icl.utk.edu> wrote: > >> On Apr 16, 2013, at 15:51 , Ralph Castain <r...@open-mpi.org> wrote: >> >>> Just curious: I thought ULFM dealt with recovering an MPI job where one or >>> more processes fail. Is this correct? >> >> It depends what is the definition of "recovering" you take. ULFM is about >> leaving the processes that remains (after a fault or a disconnect) in a >> state that allow them to continue to make progress. It is not about >> recovering processes, or user data, but it does provide the minimalistic set >> of functionalities to allow application to do this, if needed (revoke, >> agreement and shrink). >> >>> HLA/RTI consists of processes that start at random times, run to >>> completion, and then exit normally. While a failure could occur, most >>> process terminations are normal and there is no need/intent to revive them. >> >> As I said above, there is no revival of processes in ULFM, and it was never >> our intent to have such feature. The dynamic world is to be constructed >> using MPI-2 constructs (MPI_Spawn or MPI_Connect/Accept or even MPI_Join). >> >>> So it's mostly a case of massively exercising MPI's dynamic >>> connect/accept/disconnect functions. >>> >>> Do ULFM's structures have some utility for that purpose? >> >> Absolutely. If the process that leaves instead of calling MPI_Finalize calls >> exit() this will be interpreted by the version of the runtime in ULFM as an >> event triggering a report. All the ensuing mechanisms are then activated and >> the application can react to this event with the most meaningful approach it >> can envision. >> >> George. >> >>> >>> >>> On Apr 16, 2013, at 3:20 AM, George Bosilca <bosi...@icl.utk.edu> wrote: >>> >>>> There is an ongoing effort to address the potential volatility of >>>> processes in MPI called ULFM. There is a working version available at >>>> http://fault-tolerance.org. It supports TCP, sm and IB (mostly). You will >>>> find some examples, and the document explaining the additional constructs >>>> needed in MPI to achieve this. >>>> >>>> George. >>>> >>>> On Apr 15, 2013, at 17:29 , John Chludzinski <john.chludzin...@gmail.com> >>>> wrote: >>>> >>>>> That would seem to preclude its use for an RTI. Unless you have a card >>>>> up your sleeve? >>>>> >>>>> ---John >>>>> >>>>> >>>>> On Mon, Apr 15, 2013 at 11:23 AM, Ralph Castain <r...@open-mpi.org> wrote: >>>>> It isn't the fact that there are multiple programs being used - we >>>>> support that just fine. The problem with HLA/RTI is that it allows >>>>> programs to come/go at will - i.e., not every program has to start at the >>>>> same time, nor complete at the same time. MPI requires that all programs >>>>> be executing at the beginning, and that all call finalize prior to anyone >>>>> exiting. >>>>> >>>>> >>>>> On Apr 15, 2013, at 8:14 AM, John Chludzinski >>>>> <john.chludzin...@gmail.com> wrote: >>>>> >>>>>> I just received an e-mail notifying me that MPI-2 supports MPMD. This >>>>>> would seen to be just what the doctor ordered? >>>>>> >>>>>> ---John >>>>>> >>>>>> >>>>>> On Mon, Apr 15, 2013 at 11:10 AM, Ralph Castain <r...@open-mpi.org> >>>>>> wrote: >>>>>> FWIW: some of us are working on a variant of MPI that would indeed >>>>>> support what you describe - it would support send/recv (i.e., MPI-1), >>>>>> but not collectives, and so would allow communication between arbitrary >>>>>> programs. >>>>>> >>>>>> Not specifically targeting HLA/RTI, though I suppose a wrapper that >>>>>> conformed to that standard could be created. >>>>>> >>>>>> On Apr 15, 2013, at 7:50 AM, John Chludzinski >>>>>> <john.chludzin...@gmail.com> wrote: >>>>>> >>>>>> > This would be a departure from the SPMD paradigm that seems central to >>>>>> > MPI's design. Each process would be a completely different program >>>>>> > (piece of code) and I'm not sure how well that would working using >>>>>> > MPI? >>>>>> > >>>>>> > BTW, MPI is commonly used in the parallel discrete even world for >>>>>> > communication between LPs (federates in HLA). But these LPs are >>>>>> > usually the same program. >>>>>> > >>>>>> > ---John >>>>>> > >>>>>> > On Mon, Apr 15, 2013 at 10:22 AM, John Chludzinski >>>>>> > <john.chludzin...@gmail.com> wrote: >>>>>> >> Is anyone aware of an MPI based HLA/RTI (DoD High Level Architecture >>>>>> >> (HLA) / Runtime Infrastructure)? >>>>>> >> >>>>>> >> ---John >>>>>> > _______________________________________________ >>>>>> > users mailing list >>>>>> > us...@open-mpi.org >>>>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users