Thanks Reuti. The group has restored operations, using a snapshot to
restore previous spools.

-M


On Wed, Sep 3, 2014 at 6:52 AM, Reuti <re...@staff.uni-marburg.de> wrote:

> Hi,
>
> Am 02.09.2014 um 22:30 schrieb Michael Stauffer:
>
> > I'm trying to help users of another cluster whose admin is on vacation -
> a bit of Murphy's Law at work here, it seems.
> >
> > Their queue keeps failing, and after restarting qmaster it fails again
> after about a minute. The suspicion is some bad job files, judging from
> these log entries:
> >
> > => Also, the last few lines in the qmaster logfile
> > => "$SGE_ROOT/$SGE_CELL/spool/qmaster/messages"
> > =>
> > => 09/02/2014 14:15:02|  main|cbica-cluster|C|job file
> > => "jobs/00/0005/2729"
> > => has zero size
> > => 09/02/2014 14:15:02|  main|cbica-cluster|C|job file
> > => "jobs/00/0005/2726"
> > => has zero size
> > => 09/02/2014 14:15:02|  main|cbica-cluster|C|job file
> > => "jobs/00/0005/2727"
> > => has zero size
> > => 09/02/2014 14:15:02|  main|cbica-cluster|C|job file
> > => "jobs/00/0005/2728"
> > => has zero size
> > => 09/02/2014 14:15:02|  main|cbica-cluster|C|job file
> > => "jobs/00/0003/2326"
> > => has zero size
> > => 09/02/2014 14:15:02|  main|cbica-cluster|E|wrong cull version, read
> > => 0x00000000, but expected actual version 0x10020000
> > => 09/02/2014 14:15:02|  main|cbica-cluster|E|error in init_packbuffer:
> > => wrong cull version
>
> The qmaster and commands are working, it's just the exechost which keep
> failing? You could stop the execd thereon, and remove the complete spool
> directory for the node. The starting execd will recreate the directory
> structure for the particular node.
>
> If it's the structure of the qmaster instead: do you use classic spooling
> then?
>
> -- Reuti
>
>
> > How can we clear any state files and get a fresh start? Thanks. In the
> meantime I'll look more online for answers.
> >
> > -M
> > _______________________________________________
> > users mailing list
> > users@gridengine.org
> > https://gridengine.org/mailman/listinfo/users
>
>
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to