Tina Friedrich <tina.friedr...@diamond.ac.uk> writes: > Hello list, > > have finally decided to look into upgrading our SGE6.2 installation - > mainly to see if it helps with my job scheduling problem.
>From which version? It should be essentially trivial if you're upgrading from 6.2u5 (after doing an appropriate backup, of course): softstop the execds, stop qmaster, update the distribution, restart the daemons. (I'm unclear about upgrades from earlier versions and can't remember what's in the released READMEs, which say something like that now.) > Our setup is SGE_ROOT on shared NFS file system, SGE running as a > non-root user. [I assume that doesn't mean started as non-root, in which case you could only run jobs as the admin user.] > I'd quite like to keep it that way (it worked well for > us). Managed to build & install, got the qmaster running, managed to > start execds. However, at least inst_sge.sh -upd-execd simply refuses > to work if you're not root, if I remember correctly (not helping!). Yes, but that's orthogonal to how the daemons run, and I don't think it's changed at all recently. They run as sgeadmin here. In a shared installation you shouldn't need to do more than maybe update the init script on execution hosts. > Script(s) sometimes say 'You are not installing as user >root< - Can't > set the file owner/group and permissions'. It would help if they'd > tell me (without digging through them) what files they're trying to > chown/chmod and what they're trying to chown/chmod it to - so I can > fix that, if there is a problem. Goes for a lot of these sort of > errors (to do with running as non-root) - if it fails to do something, > it would really help to know what it failed to do. Well, it won't start the daemons as I assume you want, for a start. Otherwise, contributions to make messages more helpful are gratefully received. You'd have to remove the check to see what failed. > The other thing is that I keep having to run it with -nobincheck, as > far as I can tell simply because I didn't build qmon. Annoying - > should it not just check for actually required binaries? I've an idea I changed that, but can't remember. Patch welcome if not. Why not just rpmbuild it on RHEL, though, and throw away qmon if you don't want it? > Importing my old installation / upgrading from my old installation > didn't quite work. Mostly did, it seems, which is something. No error > that I'd seen during the import/upgrade, but none of my queues are > there. Host groups are; exec hosts are; complexes look okay; global > config looks right. PEs aren't there; trying to create the PEs from > the config files I originally created them from I get 'error: required > attribute "qsort_args" is missing'. Assume that's the root problem > (i.e. did not manage to import PEs, thus can't import queues). Anyone > else had issues with that? Should the save_config script have caught > that? What exactly did you do to import them? As far as I remember it should default that to "none" if it's missing, and if the upgrade script doesn't fix it, it's a bug. > And now for the important question :). My execds currently are a mix > of RHEL5 and RHEL6; SoGE got compiled on RHEL6, doesn't work on RHEL5 > execds. Also, all nodes and the master/shadow hosts get software > upgrades quite regularly - I would like to avoid having to recompile > SoGE whenever I run yum update (the old installation is nicely > agnostic to all of this, it Just Works(TM) - well, at least it worked > with RHEL5 and RHEL6.) Plus I've installed hwlock in a non-standard > location (and currently have to tell the execd process where it > is). Is there an option for aimk to build statically linked binaries? > (I'm sort of guessing that that's what the difference is here.). You shouldn't need to do that. I have a completely shared (stateless) installation with RH-ish 5 and 6 nodes using RH5 RPMs the node image gets sporadic updates, and I don't have a problem. Why do you put hwloc in a non-standard place? Anyhow, just use ldconfig or LD_LIBRARY_PATH to find it. The execd init script checks that shepherd will load (e.g. it can find hwloc). I hope you have CSP security on. -- Community Grid Engine: http://arc.liv.ac.uk/SGE/ _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users