Hi,

I'm trying to help users of another cluster whose admin is on vacation - a
bit of Murphy's Law at work here, it seems.

Their queue keeps failing, and after restarting qmaster it fails again
after about a minute. The suspicion is some bad job files, judging from
these log entries:

=> Also, the last few lines in the qmaster logfile
=> "$SGE_ROOT/$SGE_CELL/spool/qmaster/messages"
=>
=> 09/02/2014 14:15:02|  main|cbica-cluster|C|job file
=> "jobs/00/0005/2729"
=> has zero size
=> 09/02/2014 14:15:02|  main|cbica-cluster|C|job file
=> "jobs/00/0005/2726"
=> has zero size
=> 09/02/2014 14:15:02|  main|cbica-cluster|C|job file
=> "jobs/00/0005/2727"
=> has zero size
=> 09/02/2014 14:15:02|  main|cbica-cluster|C|job file
=> "jobs/00/0005/2728"
=> has zero size
=> 09/02/2014 14:15:02|  main|cbica-cluster|C|job file
=> "jobs/00/0003/2326"
=> has zero size
=> 09/02/2014 14:15:02|  main|cbica-cluster|E|wrong cull version, read
=> 0x00000000, but expected actual version 0x10020000
=> 09/02/2014 14:15:02|  main|cbica-cluster|E|error in init_packbuffer:
=> wrong cull version

How can we clear any state files and get a fresh start? Thanks. In the
meantime I'll look more online for answers.

-M
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to