Hi, I'm trying to help users of another cluster whose admin is on vacation - a bit of Murphy's Law at work here, it seems.
Their queue keeps failing, and after restarting qmaster it fails again after about a minute. The suspicion is some bad job files, judging from these log entries: => Also, the last few lines in the qmaster logfile => "$SGE_ROOT/$SGE_CELL/spool/qmaster/messages" => => 09/02/2014 14:15:02| main|cbica-cluster|C|job file => "jobs/00/0005/2729" => has zero size => 09/02/2014 14:15:02| main|cbica-cluster|C|job file => "jobs/00/0005/2726" => has zero size => 09/02/2014 14:15:02| main|cbica-cluster|C|job file => "jobs/00/0005/2727" => has zero size => 09/02/2014 14:15:02| main|cbica-cluster|C|job file => "jobs/00/0005/2728" => has zero size => 09/02/2014 14:15:02| main|cbica-cluster|C|job file => "jobs/00/0003/2326" => has zero size => 09/02/2014 14:15:02| main|cbica-cluster|E|wrong cull version, read => 0x00000000, but expected actual version 0x10020000 => 09/02/2014 14:15:02| main|cbica-cluster|E|error in init_packbuffer: => wrong cull version How can we clear any state files and get a fresh start? Thanks. In the meantime I'll look more online for answers. -M
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users