It certainly does make sense to use MPI for such a setup. But there are some important things to consider:

1. MPI, at its heart, is a communications system. There's lots of other bells and whistles (e.g., starting up a whole bunch of processes in tandem), but at the core: it's all about passing messages.

2. MPI tends to lend itself to a fairly tightly coupled systems. The usual model is that you start all of your parallel processes at the same time (e.g., "mpirun -np 32 my_application"). The current state of technology is *not* good in terms of fault tolerance -- most MPI's (Open MPI included) will kill the entire job if any one of those processes die. This is an important factor for running for weeks, months, or years.

(lots of good research is ongoing about fault tolerance and MPI, but the existing solutions are still emphasizing tightly-coupled applications or required a bunch of involvement from the application)

3. MPI also emphasizes performance: low latency, high bandwidth, good concurrency, etc.

If you don't need these things, for example, if your communication between manager and worker is infrequent, and/or the overall application time is not dominated by communication time, you might be better served for [extremely] long-running applications by using a simple (but resilient) sockets-based communication layer and not using MPI. I say this mainly because of the fault tolerance issues involved and the natural hardware MTBF values that we see on today's hardware.

Hope that helps.


On Dec 4, 2007, at 1:15 PM, doktora v wrote:

Hi, although I did my due diligence on searching for this question, I apologise if this is a repeat.

From an architectural point of view does it make sense to use MPI in the following scenario (for the purposes of resilience as much as parallelization):

Each process is a long-running process (runs non-interrupted for weeks, months or even years) that collects and crunches some streaming data, for example temperature readings, and the data is replicated to R nodes.

Because this is a diversion from the normal modus operandi (i.e. all data is immediately available), is there any obvious MPI issues that I am not considering in designing such an application?

Here is a more detailed description of the app:

A master receives the data and dispatches it according to some function such that each tuple is replicated R times to R of the N nodes (with R<=N). Suppose that there are K regions from which temperature readings stream in in the form of <K,T> where K is the region id and T is the temperature reading. The master sends <K,T> to R of the N nodes. These nodes maintain a long-term state of, say, the min/max readings. If R=N=2, the system is basically duplicated and if one of the two nodes dies inadvertently, the other one still has accounted for all the data.

Here is some pseudo-code:

int main(argc, argv)

int N=10, R=3, K=200;

Init(argc,argv);
int rank=COMM_WORLD.Get_rank();
if(rank==0) {
     int lastnode = 1;
     while(read <k,T> from socket)
       for(i in 0:R) COMM_WORLD.Send(<k,T>,1,tuple,++lastnode%N,tag);
} else {
      COMM_WORLD.Recv(<k,T>,1,tuple,any,tag,Info);
       process_message(<k,T>);
}

Many thanks for your time!
Regards
Dok
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

Reply via email to