Am 14.10.2014 um 19:13 schrieb patrick:

> Just in the qmaster messages. It will give a error such as :
> 
> 10/14/2014 11:05:07| timer|hn1|W|failed to deliver job 214320.423 to queue 
> "all.q@n72"

And nothing on the node? Full or write protected spooling directory? Is it by 
accident on a shared file space or local on each machine?

-- Reuti


> On Tue, Oct 14, 2014 at 12:56 PM, Reuti <re...@staff.uni-marburg.de> wrote:
> Am 14.10.2014 um 18:30 schrieb patrick:
> 
> > No, it will stay in 't' status and not run. Sometimes after the reboot they 
> > will change from 't' to 'r' and sometimes they will stay in 't' until 
> > deleted and resubmitted.
> 
> Aha, that's strange. Anything in the message files of the qmaster or the 
> exechost referring to this <job_id>s in question?
> 
> -- Reuti
> 
> 
> > Thanks!
> >
> > On Tue, Oct 14, 2014 at 12:07 PM, Reuti <re...@staff.uni-marburg.de> wrote:
> > Hiho,
> >
> > Am 14.10.2014 um 17:10 schrieb patrick:
> >
> > > Over the past couple month's we have run into issues with jobs on random 
> > > nodes staying in a 't' status. The only way to resolve it is to restart 
> > > the node which makes users who run array and MPI jobs frustrated. I am 
> > > not seeing anything in the logs to indicate an issue. It is using the 
> > > Berkley database and I was wondering if that could be causing the issue? 
> > > As in some maintenance needs to be done to it to keep it running smoothly?
> >
> > But the jobs ran fine essentially - so it's more a cosmetic issue?
> >
> > -- Reuti
> >
> 
> 


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to