Am 07.06.2012 08:19, schrieb Taisuke Yamada: > I attended Paolo Bonzini's qemu session ("Live Disk Operations: Juggling > Data and Trying to go Unnoticed") in LinuxCon Japan, and he adviced me > to post the bits I have regarding my question on qemu's support on shrinking > CoW image. > > Here's my problem description. > > I recently designed a experimental system which holds VM master images > on a HDD and CoW snapshots on a SSD. VMs run on CoW snapshots only. > This split-image configration is done to keep VM I/Os on a SSD
This is an interesting use case that I wasn't aware of yet. So you're not really interested in a snapshot here, but what you're trying to do is using the SSD as some sort of a cache, right? > As SSD capacity is rather limited, I need to do a writeback commit from SSD to > HDD time to time, and that is done during weekend/midnight. The problem is > although a commit is made, that alone won't shrink CoW image - all unused > blocks > are still kept in a snapshot, and uses up space. > > Patch attached is a workaround I added to cope with the problem, > but the basic problem I faced was that both QCOW2/QED format still does not > support "bdrv_make_empty" API. > > Implementing the API (say, by hole punching) seemed like a lot of effort, so > I ended up creating a new CoW image, and then replace current CoW > snapshot with a new (empty) one. But I find the code ugly. It's kind of a hack indeed, but if it works...? :-) I agree that the real solution would be hole punching. We do already support this for raw images on XFS and we want to extend it (I think there are even patches floating around for it). Once you have this, implementing bdrv_make_empty() for qcow2 shouldn't be too hard, it might actually just take calling qcow2_co_discard() and adding another discard call in qcow2_free_cluster() that passes the request to the image file. > In his talk, Paolo suggested possibility of using new "live op" API for this > task, but I'm not aware of the actual API. Is there any documentation or > source code I can look at to re-implement above feature? The problem that a live block operation could solve sounds unrelated to me: While you perform your 'commit' monitor command, the VM doesn't run. Some kind of live commit would surely be helpful there (that's what Jeff Cody is working on, iirc). Maybe you could actually use a live commit mode where the commit stays active all the time in the background so that you write to your SSD and signal completion to the guest while the background job starts copying the request to your backing file on the slow disk. Or actually, it sounds quite similar to the "block mirror" approaches that were discussed recently, where guest requests are duplicated to the current (SSD) image and a secondary (HDD) image. One other thing to consider that complicates everything is that committing to the backing file when bdrv_make_empty is implemented obviously also means that reads go to the slow disk now. So I guess you really want to commit only part of the image on the SSD... Kevin