It sounds like, with the fault tolerance features specifically mentioned by Vasiliy, MPI in its current form may not be the simplest choice.
On Tue, 2010-03-09 at 18:55 -0700, Ralph Castain wrote: > Running an orted directly won't work - it is intended solely to be launched > when running a job with "mpirun". > > You application doesn't immediately sounds like it -needs- MPI, though you > could always use it anyway. The MPI messaging system is fast, but it isn't > clear if your application will necessarily benefit from that speed. It > depends upon how much communication is going on vs computation and idle time. > > If you are more familiar with the non-MPI methods, I would personally do it > that way unless I found a need for MPI - for example, a place where MPI > collectives such as MPI_Allgather would be helpful. > > > On Mar 9, 2010, at 12:10 PM, Vasiliy G Tolstov wrote: > > > Hello. > > Some times ago i run study MPI (openmpi). > > I need to write application (client/server) runs on 50 servers in > > parallel. Each application can communicate with others by tcp/ip (send > > commands, doing some parallel computations). > > > > Master - controls all clients - slaves (send control commands, if needed > > restart clients). If master machine with server application die, some > > other server need to recive master role and controls other slaves. > > > > Can i do this things with openmpi? Or i need to write standart tcp/ip > > client/server application? > > > > I'm try to read some search results in google like this - > > http://docs.sun.com/source/819-7480-11/ExecutingPrograms.htmlaopenmpi% > > 20orted%20persistent%20daemon > > but orted return error: > > > > orted --daemonize > > [mobile:24107] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file > > runtime/orte_init.c at line 125 > > -------------------------------------------------------------------------- > > It looks like orte_init failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during orte_init; some of which are due to configuration or > > environment problems. This failure appears to be an internal failure; > > here's some additional information (which may only be relevant to an > > Open MPI developer): > > > > orte_ess_base_select failed > > --> Returned value Not found (-13) instead of ORTE_SUCCESS > > > > > > Thank You. Sorry for my poor english. > > > > > > -- > > Vasiliy G Tolstov <v.tols...@selfip.ru> > > Selfip.Ru > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users