Re: [gridengine users] How to clear possibly-corrupted queue state?

Reuti Wed, 03 Sep 2014 03:56:42 -0700

Hi,

Am 02.09.2014 um 22:30 schrieb Michael Stauffer:


> I'm trying to help users of another cluster whose admin is on vacation - a 
> bit of Murphy's Law at work here, it seems.
> 
> Their queue keeps failing, and after restarting qmaster it fails again after 
> about a minute. The suspicion is some bad job files, judging from these log 
> entries:
> 
> => Also, the last few lines in the qmaster logfile
> => "$SGE_ROOT/$SGE_CELL/spool/qmaster/messages"
> =>
> => 09/02/2014 14:15:02|  main|cbica-cluster|C|job file
> => "jobs/00/0005/2729"
> => has zero size
> => 09/02/2014 14:15:02|  main|cbica-cluster|C|job file
> => "jobs/00/0005/2726"
> => has zero size
> => 09/02/2014 14:15:02|  main|cbica-cluster|C|job file
> => "jobs/00/0005/2727"
> => has zero size
> => 09/02/2014 14:15:02|  main|cbica-cluster|C|job file
> => "jobs/00/0005/2728"
> => has zero size
> => 09/02/2014 14:15:02|  main|cbica-cluster|C|job file
> => "jobs/00/0003/2326"
> => has zero size
> => 09/02/2014 14:15:02|  main|cbica-cluster|E|wrong cull version, read
> => 0x00000000, but expected actual version 0x10020000
> => 09/02/2014 14:15:02|  main|cbica-cluster|E|error in init_packbuffer:
> => wrong cull version

The qmaster and commands are working, it's just the exechost which keep 
failing? You could stop the execd thereon, and remove the complete spool 
directory for the node. The starting execd will recreate the directory 
structure for the particular node.

If it's the structure of the qmaster instead: do you use classic spooling then?

-- Reuti


> How can we clear any state files and get a fresh start? Thanks. In the 
> meantime I'll look more online for answers.
> 
> -M
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] How to clear possibly-corrupted queue state?

Reply via email to