Jake Carroll <[email protected]> writes:

> Hi.
>
> Interesting. 
>
> From /opt/gridengine/default/spool/compute-0-4/messages, we are seeing
> some unusual stuff (or, maybe it is entirely run of the mill?):

I'm not sure whether it's the same as
https://arc.liv.ac.uk/trac/SGE/ticket/1418, which I haven't tried to
debug.  It might be relevant, what operating system it is.

> 01/16/2013 18:05:50|  main|compute-0-4|W|reaping job "1350379" ptf
> complains: Job does not exist
> 01/16/2013 18:07:56|  main|compute-0-4|E|removing unreferenced job
> 1350379.4111 without job report from ptf

> At this point, we're scratching our heads and considering a reboot of the
> head node on Friday, as we really aren't understanding what is going wrong
> here.

I'd restart the execd on the node, if anything, and possibly the
qmaster.  I can't think rebooting the head would be useful.

-- 
Community Grid Engine:  http://arc.liv.ac.uk/SGE/
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to