Yeap - that's exactly
what is was. Not a single sge_execd crash since the change.
Thank you! I owe you
a box of beer!
Joseph
On 11/8/2018 9:17 PM, Daniel Povey
wrote:
OK, well there's your problem. You need to
i
Oh for goodness sake.
Are you saying that the gid_range in sge is a range of gid I
DO-NOT use on the cluster?
Holly macro - that is very misleading.
Thank you! Giving it a try.
Best,
Joseph
On 11/8/2018 9:17 PM, Daniel Povey
OK, well there's your problem. You need to increase the start of gid_range
to a value larger than your largest possible 'real' userid: for instance,
1.
The name is a little confusing. It needs to be a range that's disjoint
from the range of possible userids.
On Fri, Nov 9, 2018 at 12:12 AM
Hi Dan.
Thank you for the suggestion. Here is what I have:
# qconf -sconf | grep gid_range
gid_range 200-70
The highest gid is 3135.
Best,
Joseph
On 11/8/2018 8:58 PM, Daniel Povey
wrote:
Do
qconf -sconf | grep gid_range
and check whether any of your users have group id's in that range. That
can lead to things being killed.
Dan
On Thu, Nov 8, 2018 at 10:33 PM Joseph Farran wrote:
> Greetings.
>
> I am running SGE 8.1.9 on a cluster with some 10k cores, CentOS 6.9.
>
> I am seei
Greetings.
I am running SGE 8.1.9 on a cluster with some 10k
cores, CentOS 6.9.
I am seeing job failures on
nodes where the node's sge_execd
unexpectedly dies.
I ran strace on the nodes sge_execd and it's not of much help.
It always en
Reuti writes:
> Please have a look at your /tmp. The starting execd will write the cause of
> not being able to start in a file therein.
For what it's worth, that depends on the version. sge-8.0.0e+ writes to
syslog, as you'd expect a daemon to. (The previous behaviour was also
insecure.) De
On Sep 12, 2013, at 12:12 PM, Reuti wrote:
> Hi,
>
> Please have a look at your /tmp. The starting execd will write the cause of
> not being able to start in a file therein.
>
Nailed it. Thank you.
can't create directory "/var/spool/sge"
Pretty self explanatory now.
Hi,
Am 12.09.2013 um 17:50 schrieb Edward Ned Harvey:
> I'm having a heck of a time figuring out why.
>
> On rhel6, /etc/init.d/sgeexecd.myclustername script is run at startup, or via
> sudo after startup.
> sudo /etc/init.d/sgeexecd.myclustername start
>
> It just says "OK" and no other outpu
I'm having a heck of a time figuring out why.
On rhel6, /etc/init.d/sgeexecd.myclustername script is run at startup, or via
sudo after startup.
sudo /etc/init.d/sgeexecd.myclustername start
It just says "OK" and no other output, yet the daemon isn't running.
I added the "-x" option to '#!/bin/s
10 matches
Mail list logo