On Apr 3, 2011, at 3:22 PM, Reuti wrote: > Am 03.04.2011 um 22:57 schrieb Ralph Castain: > >> On Apr 3, 2011, at 2:00 PM, Laurence Marks wrote: >> >>>>> >>>>> I am not using that computer. A scenario that I have come across is >>>>> that when a msub job is killed because it has exceeded it's Walltime >>>>> mpi tasks spawned by ssh may not be terminated because (so I am told) >>>>> Torque does not know about them. >>>> >>>> Not true with OMPI. Torque will kill mpirun, which will in turn cause all >>>> MPI procs to die. Yes, it's true that Torque won't know about the MPI >>>> procs itself. However, OMPI is designed such that termination of mpirun by >>>> the resource manager will cause all apps to die. >>> >>> How does Torque on NodeA know that an mpi launched on NodeB by ssh >>> should be killed? >> >> Torque works at the job level. So if you get an interactive Torque session, >> Torque can only kill your session - which means it automatically kills >> everything started within that session, regardless of where it resides. >> >> Perhaps you don't fully understand how Torque works? As a brief recap, >> Torque allocates the requested number of nodes. On one of the nodes, it >> starts a "sister mom" that is responsible for that job. It also wires Torque >> daemons on each of the other nodes to the "sister mom" to create, in effect, >> a virtual machine. >> >> When the Torque session is completed, the "sister mom" notifies all the >> other Torque daemons in the VM that the session shall be terminated. At that >> time, all local procs belonging to that session are terminated. It doesn't >> matter how those procs got there - by ssh, mpirun, whatever. They -all- are >> killed. > > Is this a new feature? In the Torque clusters I saw they have cron jobs > running on all nodes to remove processes which are not invoked by the TM > interface of Torque, e.g. because they were started by ssh. > > When I get you right, you state that even with an ssh to a node you will > still get a correct accounting.
Well, all I can say is it works perfectly for me. I ran things this way on Torque clusters for years while working where they are used. > > >> What Torque cannot do is kill the actual mpi processes started by mpirun. >> See below. >> >>> OMPI is designed (from what I can see) for all >>> mpirun to be started from the same node, not distributed mpi launched >>> independently from multiple nodes. >> >> Remember, mpirun launches its own set of daemons on each node. Each daemon >> then locally spawns its set of mpi processes. So mpirun knows where >> everything is and can kill it. >> >> To further ensure cleanup, each daemon monitors mpirun's existence. So >> Torque only knows about mpirun, and Torque kills mpirun when (e.g.) walltime >> is reached. OMPI's daemons see that mpirun has died and terminate their >> local processes prior to terminating themselves. > > I thought Open MPI has a tight integration into Torque by using the TM > interface? Hence Torque provides a correct accounting and can also kill all > started orted's as it knows about them. > > http://www.open-mpi.org/faq/?category=tm That is correct, but not necessary. Torque only needs to kill mpirun - the daemons have a mechanism by which they automatically die when mpirun disappears. > > -- Reuti > > >> Torque cannot directly kill the mpi processes because it has no knowledge of >> their existence and relationship to the job session. Instead, since Torque >> knows about the ssh that started mpirun (since you executed it >> interactively), it kills the ssh - which causes mpirun to die, which then >> causes the mpi apps to die. >>> I am not certain that killing the >>> ssh on NodeA will in fact terminate a mpi launched on NodeB (i.e. by >>> ssh NodeB mpirun AAA...) with OMPI. >>> >> >> It most certainly will! That mpirun on nodeB is executing under the ssh from >> nodeA, so when that ssh session is killed, it automatically kills everything >> run underneath it. And when mpirun dies, so does the job it was running, as >> per above. >> >> You can prove this to yourself rather easily. Just ssh to a remote node and >> execute any command that lingers for awhile - say something simple like >> "sleep". Then kill the ssh and do a "ps" on the remote node. I guarantee >> that the command will have died. >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users