Seems to me, that I'm about to issue something similar to:   update
cloud.vm_instance set ha = 0 where ha =1...

Now seriously, wondering, per the manual - if you define HA host tag on the
global config level, and then have NO hosts with that tag - MGMT will not
be able to start VMs on other hosts, since there are no hosts that are
dedicated got HA destination ?

Does this makes sense ? I guess the VMs will be just marked as Stopped in
the GUI/databse, but unable to start them...
Stupid proposal, but... ?

On 16 February 2015 at 16:22, Logan Barfield <lbarfi...@tqhosting.com>
wrote:

> Some sort of fencing independent of the management server is
> definitely needed.  HA in general (particularly on KVM) is all kinds
> of unpredictable/buggy right now.
>
> I like the idea of having a switch that an admin can flip to stop HA.
> In fact I think a better job control system in general (e.g., being
> able to stop/restart/manually start tasks) would be awesome, if it's
> feasible.
>
> Thank You,
>
> Logan Barfield
> Tranquil Hosting
>
>
> On Mon, Feb 16, 2015 at 10:05 AM, Wido den Hollander <w...@widodh.nl>
> wrote:
> >
> >
> > On 16-02-15 13:16, Andrei Mikhailovsky wrote:
> >> I had similar issues at least two or thee times. The host agent would
> disconnect from the management server. The agent would not connect back to
> the management server without manual intervention, however, it would
> happily continue running the vms. The management server would initiate the
> HA and fire up vms, which are already running on the disconnected host. I
> ended up with a handful of vms and virtual routers being ran on two
> hypervisors, thus corrupting the disk and having all sorts of issues ((( .
> >>
> >> I think there has to be a better way of dealing with this case. At
> least on an image level. Perhaps a host should keep some sort of lock file
> or a file for every image where it would record a time stamp. Something
> like:
> >>
> >> f5ffa8b0-d852-41c8-a386-6efb8241e2e7 and
> >> f5ffa8b0-d852-41c8-a386-6efb8241e2e7-timestamp
> >>
> >> Thus, the f5ffa8b0-d852-41c8-a386-6efb8241e2e7 is the name of the disk
> image and f5ffa8b0-d852-41c8-a386-6efb8241e2e7-timestamp is the image's
> time stamp.
> >>
> >> The hypervisor should record the time stamp in this file while the vm
> is running. Let's say every 5-10 seconds. If the timestamp is old, we can
> assume that the volume is no longer used by the hypervisor.
> >>
> >> When a vm is started, the timestamp file should be checked and if the
> timestamp is recent, the vm should not start, otherwise, the vm should
> start and the timestamp file should be regularly updated.
> >>
> >> I am sure there are better ways of doing this, but at least this method
> should not allow two vms running on different hosts to use the same volume
> and corrupt the data.
> >>
> >> In ceph, as far as I remember, a new feature is being developed to
> provide a locking mechanism of an rbd image. Not sure if this will do the
> job?
> >>
> >
> > Something like this is still on my wishlist for Ceph/RBD, something like
> > you propose.
> >
> > For NFS we currently have this in place, but for Ceph/RBD we don't. It's
> > a matter of code in the Agent and the investigators inside the
> > Management Server which decide if HA should kick in.
> >
> > Wido
> >
> >> Andrei
> >>
> >> ----- Original Message -----
> >>
> >>> From: "Wido den Hollander" <w...@widodh.nl>
> >>> To: dev@cloudstack.apache.org
> >>> Sent: Monday, 16 February, 2015 11:32:13 AM
> >>> Subject: Re: Disable HA temporary ?
> >>
> >>> On 16-02-15 11:00, Andrija Panic wrote:
> >>>> Hi team,
> >>>>
> >>>> I just had funny behaviour few days ago - one of my hosts was under
> >>>> heavy
> >>>> load (some disk/network load) and it went disconnected from MGMT
> >>>> server.
> >>>>
> >>>> Then MGMT server stared doing HA thing, but without being able to
> >>>> make sure
> >>>> that the VMs on the disconnected hosts are really shutdown (and
> >>>> they were
> >>>> NOT).
> >>>>
> >>>> So MGMT started again some VMs on other hosts, thus resulting in
> >>>> having 2
> >>>> copies of the same VM, using shared strage - so corruption happened
> >>>> on the
> >>>> disk.
> >>>>
> >>>> Is there a way to temporary disable HA feature on global level, or
> >>>> anything
> >>>> similar ?
> >>
> >>> Not that I'm aware of, but this is something I also ran in to a
> >>> couple
> >>> of times.
> >>
> >>> It would indeed be nice if there could be a way to stop the HA
> >>> process
> >>> completely as an Admin.
> >>
> >>> Wido
> >>
> >>>> Thanks
> >>>>
> >>
>



-- 

Andrija Panić

Reply via email to