Am 14.10.2014 um 19:41 schrieb patrick: > It is on a shared file space with RW to root.
It's better to have the spool directory local on each node: https://arc.liv.ac.uk/SGE/howto/nfsreduce.html -- Reuti > Get this message after the node was rebooted in the nodes message file. > > > 10/14/2014 11:28:16| main|n72|E|removing unreferenced job 214320.247 without > job report from ptf > 10/14/2014 11:28:53| main|n72|W|reaping job "214320" ptf complains: Job does > not exist > > On Tue, Oct 14, 2014 at 1:15 PM, Reuti <re...@staff.uni-marburg.de> wrote: > Am 14.10.2014 um 19:13 schrieb patrick: > > > Just in the qmaster messages. It will give a error such as : > > > > 10/14/2014 11:05:07| timer|hn1|W|failed to deliver job 214320.423 to queue > > "all.q@n72" > > And nothing on the node? Full or write protected spooling directory? Is it by > accident on a shared file space or local on each machine? > > -- Reuti > > > > On Tue, Oct 14, 2014 at 12:56 PM, Reuti <re...@staff.uni-marburg.de> wrote: > > Am 14.10.2014 um 18:30 schrieb patrick: > > > > > No, it will stay in 't' status and not run. Sometimes after the reboot > > > they will change from 't' to 'r' and sometimes they will stay in 't' > > > until deleted and resubmitted. > > > > Aha, that's strange. Anything in the message files of the qmaster or the > > exechost referring to this <job_id>s in question? > > > > -- Reuti > > > > > > > Thanks! > > > > > > On Tue, Oct 14, 2014 at 12:07 PM, Reuti <re...@staff.uni-marburg.de> > > > wrote: > > > Hiho, > > > > > > Am 14.10.2014 um 17:10 schrieb patrick: > > > > > > > Over the past couple month's we have run into issues with jobs on > > > > random nodes staying in a 't' status. The only way to resolve it is to > > > > restart the node which makes users who run array and MPI jobs > > > > frustrated. I am not seeing anything in the logs to indicate an issue. > > > > It is using the Berkley database and I was wondering if that could be > > > > causing the issue? As in some maintenance needs to be done to it to > > > > keep it running smoothly? > > > > > > But the jobs ran fine essentially - so it's more a cosmetic issue? > > > > > > -- Reuti > > > > > > > > > _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users