On Fri, Jun 15, 2012 at 11:11 AM, Rayson Ho <[email protected]> wrote:
> Can you set "execd_params" to KEEP_ACTIVE for this host?? (See the > manpage at this URL: > http://gridscheduler.sourceforge.net/htmlman/htmlman5/sge_conf.html ) > > Request the job to run in this queue/host again, and see why the > shepherd can't open the job_pid. > > (And remember to unset the execd_params or else you will fill up your > local spool dir eventually with job information.) > > I can't do this on my production grid. And I don't know how to replicate the problem currently. I will set things up on a test setup and try and reproduce the issue with KEEP_ACTIVE turned on. Is it possible to set the KEEP_ACTIVE per host? I only see this in the qconf -sconf > Rayson > > > > On Fri, Jun 15, 2012 at 12:58 PM, Michael Coffman > <[email protected]> wrote: > > On Fri, Jun 15, 2012 at 10:11 AM, Rayson Ho <[email protected]> wrote: > >> > >> On Fri, Jun 15, 2012 at 12:01 PM, Michael Coffman > >> <[email protected]> wrote: > >> > From the qmaster messages file: > >> > 06/14/2012 21:29:39|worker|gemaster|W|job 3885.1 failed on host > >> > cs428.ftc.avagotech.net general before job because: 06/14/2012 > 21:29:37 > >> > [20339:8436]: can't open file job_pid: Permission denied > >> > > >> > I checked a job_pid file on a currently running job on the system that > >> > had > >> > the above errors, permission down the entire tree seems fine and here > is > >> > the > >> > job_id file: > >> > > >> > -rw-r--r-- 1 grid grid 6 Jun 14 17:40 job_pid > >> > >> Is your execd spool dir on NFS or local?? > >> > > Local. > > > >> > >> Also, does it happen to all nodes or just a node or queue? > >> > > > > Happened on 2 different nodes. Not all jobs caused this. > > > >> > >> Rayson > >> > >> > >> > >> > > >> > Any clues? Is the path perhaps hard coded into sge_shepherd for > this > >> > file? > >> > > >> > Thanks. > >> > -- > >> > -MichaelC > >> > > >> > _______________________________________________ > >> > users mailing list > >> > [email protected] > >> > https://gridengine.org/mailman/listinfo/users > >> > > > > > > > > > > > -- > > -MichaelC > -- -MichaelC
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
