Re: [Qemu-devel] [Patch v12 resend 05/10] docs: block replication's description

Dr. David Alan Gilbert Mon, 04 Jan 2016 07:52:53 -0800

* Stefan Hajnoczi (stefa...@redhat.com) wrote:
> On Wed, Dec 02, 2015 at 01:31:46PM +0800, Wen Congyang wrote:
> > +== Failure Handling ==
> > +There are 6 internal errors when block replication is running:
> > +1. I/O error on primary disk
> > +2. Forwarding primary write requests failed
> > +3. Backup failed
> > +4. I/O error on secondary disk
> > +5. I/O error on active disk
> > +6. Making active disk or hidden disk empty failed
> > +In case 1 and 5, we just report the error to the disk layer. In case 2, 3,
> > +4 and 6, we just report block replication's error to FT/HA manager (which
> > +decides when to do a new checkpoint, when to do failover).
> > +There is no internal error when doing failover.
> 
> Not sure this is true.
> 
> Below it says the following for failover: "We will flush the Disk buffer
> into Secondary Disk and stop block replication".  Flushing the disk
> buffer can result in I/O errors.  This means that failover operations
> are not guaranteed to succeed.
> 
> In practice I think this is similar to a successful failover followed by
> immediately getting I/O errors on the new Primary Disk.  It means that
> right after failover there is another failure and the system may not be
> able to continue.


Yes, I think that's true.

> So this really only matters in the case where there is a new Secondary
> ready after failover.  In that case the user might expect failover to
> continue to the new Secondary (Host 3):
> 
>    [X]        [X]
>   Host 1 <-> Host 2 <-> Host 3

Since COLO is just doing a 1+1 redundency, I think it's not expecting to
cope with a double host failure; it's going to take some time (seconds?) to
sync Host 3 back in when you add it after a failover and the aim would
be not to have distrubed the application for that long, so it should
already be running on Host 2 during that resync.

Dave
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [Patch v12 resend 05/10] docs: block replication's description

Reply via email to