Thanks Reuti. The group has restored operations, using a snapshot to restore previous spools.
-M On Wed, Sep 3, 2014 at 6:52 AM, Reuti <re...@staff.uni-marburg.de> wrote: > Hi, > > Am 02.09.2014 um 22:30 schrieb Michael Stauffer: > > > I'm trying to help users of another cluster whose admin is on vacation - > a bit of Murphy's Law at work here, it seems. > > > > Their queue keeps failing, and after restarting qmaster it fails again > after about a minute. The suspicion is some bad job files, judging from > these log entries: > > > > => Also, the last few lines in the qmaster logfile > > => "$SGE_ROOT/$SGE_CELL/spool/qmaster/messages" > > => > > => 09/02/2014 14:15:02| main|cbica-cluster|C|job file > > => "jobs/00/0005/2729" > > => has zero size > > => 09/02/2014 14:15:02| main|cbica-cluster|C|job file > > => "jobs/00/0005/2726" > > => has zero size > > => 09/02/2014 14:15:02| main|cbica-cluster|C|job file > > => "jobs/00/0005/2727" > > => has zero size > > => 09/02/2014 14:15:02| main|cbica-cluster|C|job file > > => "jobs/00/0005/2728" > > => has zero size > > => 09/02/2014 14:15:02| main|cbica-cluster|C|job file > > => "jobs/00/0003/2326" > > => has zero size > > => 09/02/2014 14:15:02| main|cbica-cluster|E|wrong cull version, read > > => 0x00000000, but expected actual version 0x10020000 > > => 09/02/2014 14:15:02| main|cbica-cluster|E|error in init_packbuffer: > > => wrong cull version > > The qmaster and commands are working, it's just the exechost which keep > failing? You could stop the execd thereon, and remove the complete spool > directory for the node. The starting execd will recreate the directory > structure for the particular node. > > If it's the structure of the qmaster instead: do you use classic spooling > then? > > -- Reuti > > > > How can we clear any state files and get a fresh start? Thanks. In the > meantime I'll look more online for answers. > > > > -M > > _______________________________________________ > > users mailing list > > users@gridengine.org > > https://gridengine.org/mailman/listinfo/users > >
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users