I agree...and understand :)

But would this means, that VMs will not be provisioned anywhere during HA
kicking in ? I guess so...
So I avoid having started another copy of the same VM, that is alrady
running on disconnected hosts - I need this as the temporary solution,
during CEPH backfilling, so not sure if this heavy hack is good , or will
case me even more trouble...

cheers


On 16 February 2015 at 16:58, Logan Barfield <lbarfi...@tqhosting.com>
wrote:

> Hi Andrija,
>
> The way I understand it (and have seen in practice) is that by default
> the MGMT server will use any available server for HA.  Setting the HA
> tag on a hosts just dedicates that host to HA, meaning that during
> normal provisioning no VMs will use that host, it will only be used
> for HA purposes.  In other words, the "HA" tag is not required for HA
> to work.
>
> Thank You,
>
> Logan Barfield
> Tranquil Hosting
>
>
> On Mon, Feb 16, 2015 at 10:43 AM, Andrija Panic <andrija.pa...@gmail.com>
> wrote:
> > Seems to me, that I'm about to issue something similar to:   update
> > cloud.vm_instance set ha = 0 where ha =1...
> >
> > Now seriously, wondering, per the manual - if you define HA host tag on
> the
> > global config level, and then have NO hosts with that tag - MGMT will not
> > be able to start VMs on other hosts, since there are no hosts that are
> > dedicated got HA destination ?
> >
> > Does this makes sense ? I guess the VMs will be just marked as Stopped in
> > the GUI/databse, but unable to start them...
> > Stupid proposal, but... ?
> >
> > On 16 February 2015 at 16:22, Logan Barfield <lbarfi...@tqhosting.com>
> > wrote:
> >
> >> Some sort of fencing independent of the management server is
> >> definitely needed.  HA in general (particularly on KVM) is all kinds
> >> of unpredictable/buggy right now.
> >>
> >> I like the idea of having a switch that an admin can flip to stop HA.
> >> In fact I think a better job control system in general (e.g., being
> >> able to stop/restart/manually start tasks) would be awesome, if it's
> >> feasible.
> >>
> >> Thank You,
> >>
> >> Logan Barfield
> >> Tranquil Hosting
> >>
> >>
> >> On Mon, Feb 16, 2015 at 10:05 AM, Wido den Hollander <w...@widodh.nl>
> >> wrote:
> >> >
> >> >
> >> > On 16-02-15 13:16, Andrei Mikhailovsky wrote:
> >> >> I had similar issues at least two or thee times. The host agent would
> >> disconnect from the management server. The agent would not connect back
> to
> >> the management server without manual intervention, however, it would
> >> happily continue running the vms. The management server would initiate
> the
> >> HA and fire up vms, which are already running on the disconnected host.
> I
> >> ended up with a handful of vms and virtual routers being ran on two
> >> hypervisors, thus corrupting the disk and having all sorts of issues
> ((( .
> >> >>
> >> >> I think there has to be a better way of dealing with this case. At
> >> least on an image level. Perhaps a host should keep some sort of lock
> file
> >> or a file for every image where it would record a time stamp. Something
> >> like:
> >> >>
> >> >> f5ffa8b0-d852-41c8-a386-6efb8241e2e7 and
> >> >> f5ffa8b0-d852-41c8-a386-6efb8241e2e7-timestamp
> >> >>
> >> >> Thus, the f5ffa8b0-d852-41c8-a386-6efb8241e2e7 is the name of the
> disk
> >> image and f5ffa8b0-d852-41c8-a386-6efb8241e2e7-timestamp is the image's
> >> time stamp.
> >> >>
> >> >> The hypervisor should record the time stamp in this file while the vm
> >> is running. Let's say every 5-10 seconds. If the timestamp is old, we
> can
> >> assume that the volume is no longer used by the hypervisor.
> >> >>
> >> >> When a vm is started, the timestamp file should be checked and if the
> >> timestamp is recent, the vm should not start, otherwise, the vm should
> >> start and the timestamp file should be regularly updated.
> >> >>
> >> >> I am sure there are better ways of doing this, but at least this
> method
> >> should not allow two vms running on different hosts to use the same
> volume
> >> and corrupt the data.
> >> >>
> >> >> In ceph, as far as I remember, a new feature is being developed to
> >> provide a locking mechanism of an rbd image. Not sure if this will do
> the
> >> job?
> >> >>
> >> >
> >> > Something like this is still on my wishlist for Ceph/RBD, something
> like
> >> > you propose.
> >> >
> >> > For NFS we currently have this in place, but for Ceph/RBD we don't.
> It's
> >> > a matter of code in the Agent and the investigators inside the
> >> > Management Server which decide if HA should kick in.
> >> >
> >> > Wido
> >> >
> >> >> Andrei
> >> >>
> >> >> ----- Original Message -----
> >> >>
> >> >>> From: "Wido den Hollander" <w...@widodh.nl>
> >> >>> To: dev@cloudstack.apache.org
> >> >>> Sent: Monday, 16 February, 2015 11:32:13 AM
> >> >>> Subject: Re: Disable HA temporary ?
> >> >>
> >> >>> On 16-02-15 11:00, Andrija Panic wrote:
> >> >>>> Hi team,
> >> >>>>
> >> >>>> I just had funny behaviour few days ago - one of my hosts was under
> >> >>>> heavy
> >> >>>> load (some disk/network load) and it went disconnected from MGMT
> >> >>>> server.
> >> >>>>
> >> >>>> Then MGMT server stared doing HA thing, but without being able to
> >> >>>> make sure
> >> >>>> that the VMs on the disconnected hosts are really shutdown (and
> >> >>>> they were
> >> >>>> NOT).
> >> >>>>
> >> >>>> So MGMT started again some VMs on other hosts, thus resulting in
> >> >>>> having 2
> >> >>>> copies of the same VM, using shared strage - so corruption happened
> >> >>>> on the
> >> >>>> disk.
> >> >>>>
> >> >>>> Is there a way to temporary disable HA feature on global level, or
> >> >>>> anything
> >> >>>> similar ?
> >> >>
> >> >>> Not that I'm aware of, but this is something I also ran in to a
> >> >>> couple
> >> >>> of times.
> >> >>
> >> >>> It would indeed be nice if there could be a way to stop the HA
> >> >>> process
> >> >>> completely as an Admin.
> >> >>
> >> >>> Wido
> >> >>
> >> >>>> Thanks
> >> >>>>
> >> >>
> >>
> >
> >
> >
> > --
> >
> > Andrija Panić
>



-- 

Andrija Panić

Reply via email to