I'm afraid not - once started, the orted must stay alive until mpirun
terminates. The problem is that the orteds are used to route messages,
and there is currently no way to remove an orted without breaking this
network.
I know people are investigating this possibility in support of fault
tolerance for when an orted unexpectedly fails, but there is nothing
currently in the code base for this capability.
Ralph
On Mar 9, 2009, at 6:28 AM, Marcia Cristina Cera wrote:
Hi,
May I sign one orted daemon to finish its execution on-the-fly?
Context: I intend to use OpenMPI in a dynamic resource environment
as I did with LAM/MPI helped by lamgrow and lamshrink commands.
To perform grow operations (increase the amount of nodes/resources
on-the-fly) OpenMPI enable an incremental resource utilization. All
nodes that can be used are listed in the hostifile file (inform as
mpirun parameter) and according to they are firstly used through
MPI_Comm_spawn one orded daemon is created in each node. According
to some first tests, this feature is enough to satisfy our goals.
In the other hand, performing shrink operations, we need to liberate
nodes to be eventually used by other application/jobs. In other
words, we must terminate all applications processes and also the
orted daemon. In the application side, the solution is easy once we
can notify the processes (by a message or signal) to safety finish
its execution and perform MPI_Finalize. In the orted side, we must
finish its execution in the target node and also update its status
to 'INVALID'.
How may I do it? Is there a specific signal or procedure to this?
Thank you in advance!
márcia.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users