On Thu, 06/13 14:07, Wenchao Xia wrote: > 于 2013-6-13 14:03, Wenchao Xia 写道: > >于 2013-6-7 15:18, Stefan Hajnoczi 写道: > >>On Thu, Jun 06, 2013 at 04:56:49PM +0800, Fam Zheng wrote: > >>>On Thu, 06/06 10:05, Stefan Hajnoczi wrote: > >>>>On Thu, Jun 06, 2013 at 11:56:18AM +0800, Fam Zheng wrote: > >>>>>On Thu, 05/30 14:34, Stefan Hajnoczi wrote: > >>>>>>+ > >>>>>>+static int coroutine_fn backup_before_write_notify( > >>>>>>+ NotifierWithReturn *notifier, > >>>>>>+ void *opaque) > >>>>>>+{ > >>>>>>+ BdrvTrackedRequest *req = opaque; > >>>>>>+ > >>>>>>+ return backup_do_cow(req->bs, req->sector_num, > >>>>>>req->nb_sectors, NULL); > >>>>>>+} > >>>>> > >>>>>I'm wondering if we can see the logic here with a backing hd > >>>>>relationship? req->bs is a backing file of job->target, but guest is > >>>>>going to write to it, so we need to COW down the data to job->target > >>>>>before overwritting (i.e. cluster is not allocated in child). > >>>>> > >>>>>I think if we do this in block layer, there's not much necessity for a > >>>>>before-write notifier here (although it may be useful for other > >>>>>cases): > >>>>> > >>>>> in bdrv_write: > >>>>> for child in req->bs->open_children > >>>>> if not child->is_allocated(req->sectors) > >>>>> do COW to child > >>>>> > >>>>>The advantage of this is that we won't need to start block-backup > >>>>>job in > >>>>>sync mode "none" to do point-in-time snapshot (image fleecing), and we > >>>>>get writable snapshot (possibility to open backing file writable and > >>>>>write to it safely) as a by-product. > >>>>> > >>>>>But we will need to keep track of parent<->child of block states, > >>>>>and we > >>>>>still need to take care of overlapping writing between block job and > >>>>>guest request. > >>>> > >>>>There's one catch here: bs->target may not support backing files, it > >>>>can > >>>>be a raw file, for example. We'll only use backing files for > >>>>point-in-time snapshots but other use cases might not. raw doesn't > >>>>really implement is_allocated(), so the whole concept would have to > >>>>change a little: > >>> > >>>Another use case may be parent modification. Suppose we have > >>> > >>> ,--- child1.qcow2 > >>> parent.qcow2 < > >>> `--- child2.qcow2 > >>> > >>>We can use parent.qcow2 as block device in QEMU without breaking > >>>child1.qcow2 or child2.qcow2 by telling QEMU who its children are: > >>> > >>> $QEMU -drive file=parent.qcow2,children=child1.qcow2:child2.qcow2 > >>> > >>>Then we open the three images and setup parent_bs->open_children, the > >>>children are protected from being corrupted. > >>> > >>>> > >>>>bs->open_children becomes independent of backing files - any > >>>>BlockDriverState can be added to this list. ->is_allocated() basically > >>>>becomes the bitmap that we keep in the block job. > >>> > >>>Yes. But it is possible to keep a bitmap for raw (and those don't > >>>implement is_allocated()) in block layer too, or in overlay: could > >>>add-cow by Dongxu Wang help here? > >> > >>Yes absolutely. > >> > >>Stefan > >> > > One advantage of external backup, or backing up chain, is that it > >holds 'Delta' data only and is small enough. If it is changed toward a > >'full' data writable snapshot, it become bigger. With backup chain > >qemu-img can restore/clone a writable and usable one, So I don't > >think adding that in qemu emulator helps much, and it will make things > >more complicit.... user won't care who is doing the job, qemu or > >qemu-img. > > > I mean that "get writable snapshot (possibility to open backing file > writable and write to it safely) as a by-product." in this series, is > not very valuable. >
I'm not selling writable snapshot, my point was just that semantic of block-backup, getting a point-in-time snapshot, inherently works like a backing chain but writting to parent (guest drive) will not break its children (our thin PIT snapshot). If we see it this way, COW is not so specific to a block job like block-backup, it can be generic in the backing chain logic. Though, the value in a writable snapshot is that we can actually _modify_ a backing image in place, rather than forking the chain to write to the new child. This is not supported with qemu or qemu-img now, once you create a child with the image as backing file, you mustn't modify it. -- Fam