On Apr 3, 2011, at 3:22 PM, Reuti wrote:

> Am 03.04.2011 um 22:57 schrieb Ralph Castain:
> 
>> On Apr 3, 2011, at 2:00 PM, Laurence Marks wrote:
>> 
>>>>> 
>>>>> I am not using that computer. A scenario that I have come across is
>>>>> that when a msub job is killed because it has exceeded it's Walltime
>>>>> mpi tasks spawned by ssh may not be terminated because (so I am told)
>>>>> Torque does not know about them.
>>>> 
>>>> Not true with OMPI. Torque will kill mpirun, which will in turn cause all 
>>>> MPI procs to die. Yes, it's true that Torque won't know about the MPI 
>>>> procs itself. However, OMPI is designed such that termination of mpirun by 
>>>> the resource manager will cause all apps to die.
>>> 
>>> How does Torque on NodeA know that an mpi launched on NodeB by ssh
>>> should be killed?
>> 
>> Torque works at the job level. So if you get an interactive Torque session, 
>> Torque can only kill your session - which means it automatically kills 
>> everything started within that session, regardless of where it resides.
>> 
>> Perhaps you don't fully understand how Torque works? As a brief recap, 
>> Torque allocates the requested number of nodes. On one of the nodes, it 
>> starts a "sister mom" that is responsible for that job. It also wires Torque 
>> daemons on each of the other nodes to the "sister mom" to create, in effect, 
>> a virtual machine.
>> 
>> When the Torque session is completed, the "sister mom" notifies all the 
>> other Torque daemons in the VM that the session shall be terminated. At that 
>> time, all local procs belonging to that session are terminated. It doesn't 
>> matter how those procs got there - by ssh, mpirun, whatever. They -all- are 
>> killed.
> 
> Is this a new feature? In the Torque clusters I saw they have cron jobs 
> running on all nodes to remove processes which are not invoked by the TM 
> interface of Torque, e.g. because they were started by ssh.
> 
> When I get you right, you state that even with an ssh to a node you will 
> still get a correct accounting.

Well, all I can say is it works perfectly for me. I ran things this way on 
Torque clusters for years while working where they are used.


> 
> 
>> What Torque cannot do is kill the actual mpi processes started by mpirun. 
>> See below.
>> 
>>> OMPI is designed (from what I can see) for all
>>> mpirun to be started from the same node, not distributed mpi launched
>>> independently from multiple nodes.
>> 
>> Remember, mpirun launches its own set of daemons on each node. Each daemon 
>> then locally spawns its set of mpi processes. So mpirun knows where 
>> everything is and can kill it.
>> 
>> To further ensure cleanup, each daemon monitors mpirun's existence. So 
>> Torque only knows about mpirun, and Torque kills mpirun when (e.g.) walltime 
>> is reached. OMPI's daemons see that mpirun has died and terminate their 
>> local processes prior to terminating themselves.
> 
> I thought Open MPI has a tight integration into Torque by using the TM 
> interface? Hence Torque provides a correct accounting and can also kill all 
> started orted's as it knows about them.
> 
> http://www.open-mpi.org/faq/?category=tm

That is correct, but not necessary. Torque only needs to kill mpirun - the 
daemons have a mechanism by which they automatically die when mpirun disappears.


> 
> -- Reuti
> 
> 
>> Torque cannot directly kill the mpi processes because it has no knowledge of 
>> their existence and relationship to the job session. Instead, since Torque 
>> knows about the ssh that started mpirun (since you executed it 
>> interactively), it kills the ssh - which causes mpirun to die, which then 
>> causes the mpi apps to die.
>>> I am not certain that killing the
>>> ssh on NodeA will in fact terminate a mpi launched on NodeB (i.e. by
>>> ssh NodeB mpirun AAA...) with OMPI.
>>> 
>> 
>> It most certainly will! That mpirun on nodeB is executing under the ssh from 
>> nodeA, so when that ssh session is killed, it automatically kills everything 
>> run underneath it. And when mpirun dies, so does the job it was running, as 
>> per above.
>> 
>> You can prove this to yourself rather easily. Just ssh to a remote node and 
>> execute any command that lingers for awhile - say something simple like 
>> "sleep". Then kill the ssh and do a "ps" on the remote node. I guarantee 
>> that the command will have died.
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to