Normally you would leave the old install in place while the old jobs finish.

Since you used a different communications port and different path, you can have the new daemons running alongside the old daemons and they don't know about each other.

Direct users to submit jobs to the new environment (usually you just set the SGE_ env vars for them anyway, right?) and then you can turn off the old daemons once all the jobs go through.


On 9/11/13 3:15 AM, Tina Friedrich wrote:
Hm. I don't know that; I don't think I tried that. I just bled the old
installation. We don't really have many very long running jobs, nor long
queues, so there weren't any pending jobs to speak of.

Maybe someone else can comment?

Tina

On 10/09/13 18:00, Txema Heredia Genestar wrote:
Thanks Tina,

Is there any way to have the new installation recognize the running and
pending jobs that were scheduled before updating? I tried copying the
whole spool directory with no luck.

I have tried installing Open Grid Scheduler using the 2011.11 binaries
over the current SGE installation, but as I couldn't manage to detect
the current jobs I reverted back to SGE.

Txema

El 10/09/13 11:40, Tina Friedrich escribió:
Hi Txema,

I recently upgrades our Grid Engine from SGE6.2u4 (I think it was) to
OGE8.1.3. No rocks though, so I don't know any details on that.

What I basically did was:

1) compile and install OGE into a different path
2) configure OGE to use different communication ports than our old
installation
3) dump and import our configuration (using the provided scripts)

I then simply ran both SGE6.2 and OGE8.1.3 initially (so it would be
easy to fall back), and disabled all queues on the old installation.

SGE6.2 and OGE8.1.3 execd's happily coexist on the same nodes;
however, I found that the qmaster processes don't, unfortunately.

But yes, there are no real problems having both in the system / active
- as long as their communication ports are different.

I can't remember if I ever tried having SGE6.2 execds talk to OGE8.1.3
- our installation is on an NFS share (i.e. not local to the nodes),
so I do not actually need to install anything on the nodes to start
the execd; and as I said, I simply started both execds, at least for
an initial period.

Tina

On 10/09/13 09:58, Txema Heredia Genestar wrote:
Hi all,

I have a cluster in production running rocks-cluster 6.0 using
SGE6.2u5.
SGE6.2u5 has a bug that kills the qmaster when an amount of jobs using
both -pe and -hold_jid are used. OGE (theoretically) has this bug
fixed.

What is the safest/cleanest way to upgrade from SGE to OGE? Should I
install the rocks-6.1 OGE roll? Will this keep/respect the current SGE
configuration or will it wipe it clean? Is the OGE daemon compatible
with SGE clients or should I update the system as a whole? Can I have
both SGE and OGE at once in the system? Should I compile OGE from
source
and override all SGE in the cluster?

Thanks in advance,

Txema
_______________________________________________
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to