Hello,
>>>> The real and effective user is not root, and never was. Never
caused us any problems. The NFS share is exported with root_squash.
If that's the spool area, it means you get world-writable files in it.
As in $SGE_ROOT/$SGE_CELL/spool?
No you don't.
[kdf51254@ws112 spool]$ ll
total 20
drwxr-xr-x. 5 sgeadmin sgeadmin 4096 May 17 15:35 cs04r-sc-com99-01
drwxr-xr-x. 5 sgeadmin sgeadmin 4096 May 17 15:35 cs04r-sc-com99-02
drwxr-xr-x. 5 sgeadmin sgeadmin 4096 May 17 15:35 cs04r-sc-com99-03
drwxr-xr-x. 5 sgeadmin sgeadmin 4096 May 17 15:35 cs04r-sc-com99-04
drwxr-xr-x. 20 sgeadmin sgeadmin 4096 May 17 16:22 qmaster
(not in the sub directories either).
This is quite interesting. And all jobs are running under their inquired user
account or do you use one common user account for all jobs?
Jobs are running as the user that submitted them, yes. No common
account. Been set up like that since we installed it.
Execd must be able to setuid for that to work, so I assume it is started
as root unless there's some fancy way of using capabilities.
Execd is started by root (init script is called as root / with root
rights); execd runs as sgeadmin. Well, in our case anyway, as sgeadmin
is the SGE admin_user. Is not as if SGE execd is the only process on a
system started as something non-root by root-run init script (I mean,
no-one runs apache as root, right? :) )
That's not something that I've set up, really, that just how it ends up
when SGE during installation finds a non-root admin user. The init
scripts look up the admin_user during startup, and start the process as
whatever is in the bootstrap as admin_user.
Haven't had time to progress with this setup much; is there any
documentation on how the 'inbuild' qrsh etc work?
Not past the source, I'm afraid.
As at the moment, my
test installation works, and I can submit jobs (and they run), but
interactive sessions don't work - I get a commlib error:
[kdf51254@ws112 ~]$ qrsh
error: commlib error: got read error (closing
"cs04r-sc-com99-04.diamond.ac.uk/shepherd_ijs/2")
Didn't have that problem on my old 6.2 installation :)
Check the logs (messages files) for any clues. I don't think it has
those symptoms, but there appears to be a race in the threading of the
builtin startup that appears on recent Ubuntu, for instance, but doesn't
on RHEL5 or 6 in our experience. You can still use ssh per
remote_startup(5).
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
--
This e-mail and any attachments may contain confidential, copyright and or
privileged material, and are for the use of the intended addressee only. If you
are not the intended addressee or an authorised recipient of the addressee
please notify us of receipt by returning the e-mail and do not use, copy,
retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and
Wales with its registered office at Diamond House, Harwell Science and
Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users