Re: [gridengine users] sge_execd dies

2018-11-09 Thread Joseph Farran
Yeap - that's exactly what is was.   Not a single sge_execd crash since the change. Thank you!   I owe you a box of beer! Joseph On 11/8/2018 9:17 PM, Daniel Povey wrote: OK, well there's your problem.  You need to i

Re: [gridengine users] sge_execd dies

2018-11-08 Thread Joseph Farran
Oh for goodness sake.    Are you saying that the gid_range in sge is a range of gid I DO-NOT use on the cluster? Holly macro - that is very misleading. Thank you!   Giving it a try. Best, Joseph On 11/8/2018 9:17 PM, Daniel Povey

Re: [gridengine users] sge_execd dies

2018-11-08 Thread Daniel Povey
OK, well there's your problem. You need to increase the start of gid_range to a value larger than your largest possible 'real' userid: for instance, 1. The name is a little confusing. It needs to be a range that's disjoint from the range of possible userids. On Fri, Nov 9, 2018 at 12:12 AM

Re: [gridengine users] sge_execd dies

2018-11-08 Thread Joseph Farran
Hi Dan. Thank you for the suggestion.   Here is what I have: # qconf -sconf | grep gid_range gid_range    200-70 The highest gid is 3135. Best, Joseph On 11/8/2018 8:58 PM, Daniel Povey wrote:

Re: [gridengine users] sge_execd dies

2018-11-08 Thread Daniel Povey
Do qconf -sconf | grep gid_range and check whether any of your users have group id's in that range. That can lead to things being killed. Dan On Thu, Nov 8, 2018 at 10:33 PM Joseph Farran wrote: > Greetings. > > I am running SGE 8.1.9 on a cluster with some 10k cores, CentOS 6.9. > > I am seei

[gridengine users] sge_execd dies

2018-11-08 Thread Joseph Farran
Greetings. I am running SGE 8.1.9 on a cluster with some 10k cores, CentOS 6.9. I am seeing job failures on nodes where the node's sge_execd unexpectedly dies. I ran strace on the nodes sge_execd and it's not of much help.   It always en

Re: [gridengine users] sge_execd dies silently with 0 exit status

2013-09-16 Thread Dave Love
Reuti writes: > Please have a look at your /tmp. The starting execd will write the cause of > not being able to start in a file therein. For what it's worth, that depends on the version. sge-8.0.0e+ writes to syslog, as you'd expect a daemon to. (The previous behaviour was also insecure.) De

Re: [gridengine users] sge_execd dies silently with 0 exit status

2013-09-12 Thread Edward Ned Harvey
On Sep 12, 2013, at 12:12 PM, Reuti wrote: > Hi, > > Please have a look at your /tmp. The starting execd will write the cause of > not being able to start in a file therein. > Nailed it. Thank you. can't create directory "/var/spool/sge" Pretty self explanatory now.

Re: [gridengine users] sge_execd dies silently with 0 exit status

2013-09-12 Thread Reuti
Hi, Am 12.09.2013 um 17:50 schrieb Edward Ned Harvey: > I'm having a heck of a time figuring out why. > > On rhel6, /etc/init.d/sgeexecd.myclustername script is run at startup, or via > sudo after startup. > sudo /etc/init.d/sgeexecd.myclustername start > > It just says "OK" and no other outpu

[gridengine users] sge_execd dies silently with 0 exit status

2013-09-12 Thread Edward Ned Harvey
I'm having a heck of a time figuring out why. On rhel6, /etc/init.d/sgeexecd.myclustername script is run at startup, or via sudo after startup. sudo /etc/init.d/sgeexecd.myclustername start It just says "OK" and no other output, yet the daemon isn't running. I added the "-x" option to '#!/bin/s