Hi!
I'm currently looking into Open MPI to see if there's a way to use the
framework for writing persistent services. With services I mean services
published with MPI_Publish_name, connected to from clients with
MPI_Lookup_name / MPI_Comm_connect (then doing simple Send/Receive for
now).
Getting the Publish/Lookup/Connect thing working isn't that hard - what
I wonder is:
- Is it possible to structure a server program using Open MPI to accept
an arbritrary amount of client connections? If so, how? Using threads?
- How can you deal with unexpected terminations of clients and/or
servers? If a server program crashes, is it possible to start a new
instance of it connecting to the same ompi-server URI, and make it take
over the current published service? Or is this a task for some kind of
checkpoint/restore technique...? (I haven't managed to take over a
published service from a crashed server program in any way yet, it seems
there's some permission issue if you publish the same service twice)
- Is it possible for a server to handle unclean client disconnects in
any way? (Clients not running Finalize(), lost connectivity etc). Maybe
by registering an error handler in a way?
If anyone has any input on these issues, it would be greatly
appreciated :) Not asking for source code examples, but maybe some
pseudo code and/or explaining of what techniques you could utilize to
achieve something like a persistent service - if it's possible at all :)
I found this interesting paper on the subject at
www.mcs.anl.gov/~thakur/papers/mpi-servers.pdf, but it only talks about
the possibilities, not so much implementation details.
Best regards,
Mads Lønsethagen