[SGE-discuss] ghost jobs in GE

2017-01-06 Thread Ondrej Valousek
Hi List,

I have a problem with a ghost jobs - these are (as per qstat) usually in state 
'r' or 'dr' - so running, but in fact nothing is running on the specified node.
I can even reboot the node, but they do not vanish - still there.

They can be deleted, but it's still worrying as they are blocking slots 
unnecessarily
Using cgroups (can this be related?), GE 8.1.8

Thanks,
Ondrej



-

The information contained in this e-mail and in any attachments is confidential 
and is designated solely for the attention of the intended recipient(s). If you 
are not an intended recipient, you must not use, disclose, copy, distribute or 
retain this e-mail or any part thereof. If you have received this e-mail in 
error, please notify the sender by return e-mail and delete all copies of this 
e-mail from your computer system(s). Please direct any additional queries to: 
communicati...@s3group.com. Thank You. Silicon and Software Systems Limited (S3 
Group). Registered in Ireland no. 378073. Registered Office: South County 
Business Park, Leopardstown, Dublin 18.
___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss


Re: [SGE-discuss] ghost jobs in GE

2017-01-06 Thread Reuti
Hi,

> Am 06.01.2017 um 14:54 schrieb Ondrej Valousek :
> 
> Hi List,
> 
> I have a problem with a ghost jobs - these are (as per qstat) usually in 
> state 'r' or 'dr' - so running, but in fact nothing is running on the 
> specified node.
> I can even reboot the node, but they do not vanish - still there.
> 
> They can be deleted, but it's still worrying as they are blocking slots 
> unnecessarily
> Using cgroups (can this be related?), GE 8.1.8

Then the qmaster might think that they are still alive and waits for a reply 
from the exechost about the state of the jobs. If the exechost has no knowledge 
of the jobs (possibly due to an emptied spool directory) he will never reply.

You can remove such jobs with:

$ qdel -f 

-- Reuti


> 
> Thanks,
> Ondrej
> 
> 
> 
> -
> 
> The information contained in this e-mail and in any attachments is 
> confidential and is designated solely for the attention of the intended 
> recipient(s). If you are not an intended recipient, you must not use, 
> disclose, copy, distribute or retain this e-mail or any part thereof. If you 
> have received this e-mail in error, please notify the sender by return e-mail 
> and delete all copies of this e-mail from your computer system(s). Please 
> direct any additional queries to: communicati...@s3group.com. Thank You. 
> Silicon and Software Systems Limited (S3 Group). Registered in Ireland no. 
> 378073. Registered Office: South County Business Park, Leopardstown, Dublin 
> 18.
> ___
> SGE-discuss mailing list
> SGE-discuss@liv.ac.uk
> https://arc.liv.ac.uk/mailman/listinfo/sge-discuss
> 

___
SGE-discuss mailing list
SGE-discuss@liv.ac.uk
https://arc.liv.ac.uk/mailman/listinfo/sge-discuss