On Mon, Jan 04, 2016 at 02:03:16PM +0800, Wen Congyang wrote: > On 12/23/2015 05:26 PM, Stefan Hajnoczi wrote: > > On Wed, Dec 02, 2015 at 01:31:46PM +0800, Wen Congyang wrote: > >> +== Failure Handling == > >> +There are 6 internal errors when block replication is running: > >> +1. I/O error on primary disk > >> +2. Forwarding primary write requests failed > >> +3. Backup failed > >> +4. I/O error on secondary disk > >> +5. I/O error on active disk > >> +6. Making active disk or hidden disk empty failed > >> +In case 1 and 5, we just report the error to the disk layer. In case 2, 3, > >> +4 and 6, we just report block replication's error to FT/HA manager (which > >> +decides when to do a new checkpoint, when to do failover). > >> +There is no internal error when doing failover. > > > > Not sure this is true. > > > > Below it says the following for failover: "We will flush the Disk buffer > > into Secondary Disk and stop block replication". Flushing the disk > > buffer can result in I/O errors. This means that failover operations > > are not guaranteed to succeed. > > We don't use mirror job now. We may use it in the next version. > Is there any way to know the I/O error when the mirror job is running? > Get the job's status?
Block jobs have an error status which is exposed via QMP. The block job emits a QMP event notifying the client. If the client issues query-block-jobs it will also see the iostatus field. I'm not aware of an internal API to monitor QMP events. It would be possible to add it but first I wonder why you want to use mirror? > > In practice I think this is similar to a successful failover followed by > > immediately getting I/O errors on the new Primary Disk. It means that > > right after failover there is another failure and the system may not be > > able to continue. > > Block replication is not designed for such case. For example, we don't do > failover on primary disk's failure. In such case, we just report the error > to the disk layer(It is the case 1 in the above Failure Handling). > > Sorry for the late reply. Your mail is sent at 2015-12-23, but I receive > it at 2016-01-04.... What is supposed to happen when flushing the Disk Buffer into the Secondary Disk fails? Stefan
signature.asc
Description: PGP signature