On Mon, Jun 6, 2011 at 10:50 AM, Kevin Wolf <kw...@redhat.com> wrote: > Am 02.06.2011 00:11, schrieb Stefan Hajnoczi: >> On Wed, Jun 1, 2011 at 10:13 AM, Alexander Graf <ag...@suse.de> wrote: >>> >>> On 01.06.2011, at 11:11, Kevin Wolf wrote: >>> >>>> Am 01.06.2011 10:49, schrieb Alexander Graf: >>>>> >>>>> On 01.06.2011, at 06:29, Stefan Hajnoczi wrote: >>>>> >>>>>> On Sun, May 29, 2011 at 2:19 PM, Fam Zheng <famc...@gmail.com> wrote: >>>>>>> As a project of Google Summer of Code 2011, I'm now working on >>>>>>> improving VMDK image support. There are many subformats of VMDK >>>>>>> virtual disk, some of which have separate descriptor file and others >>>>>>> don't, some allocate space at once and some others grow dynamically, >>>>>>> some have optional data compression. The current support of VMDK >>>>>>> format is very limited, i.e. qemu now supports single file images, but >>>>>>> couldn't recognize the widely used multi-file types. We have planned >>>>>>> to add such support to VMDK block driver and enable more image types, >>>>>>> and the working timeline is set in weeks (#1 to #7) as: >>>>>>> >>>>>>> [#1] Monolithic flat layout support >>>>>>> [#2] Implement compression and Stream-Optimized Compressed Sparse >>>>>>> Extents support. >>>>>>> [#3] Improve ESX Server Sparse Extents support. >>>>>>> [#4] Debug and test. Collect virtual disks with various versions and >>>>>>> options, test qemu-img with them. By now some patches may be ready to >>>>>>> deliver. >>>>>>> [#5, 6] Add multi-file support (2GB extent formats) >>>>>>> [#7] Clean up and midterm evaluation. >>>>>> >>>>>> Thanks to Fam's work, we'll hopefully support the latest real-world >>>>>> VMDK files in qemu-img convert within the next few months. >>>>>> >>>>>> If anyone has had particular VMDK "problem files" which qemu-img >>>>>> cannot handle, please reply, they would make interesting test cases. >>>>> >>>>> There is one very useful use-case of VMDK files that we currently don't >>>>> support: remapping. >>>>> >>>>> A vmdk file can specify that it really is backed by a raw block device, >>>>> but only for certain chunks, while other chunks of it can be mapped >>>>> read-only or zero. That is very useful when passing in a host disk to the >>>>> guest and you want to be sure that you don't break other partitions than >>>>> the one you're playing with. >>>>> >>>>> It can also shadow map those chunks. For example on the case above, the >>>>> MBR is COW (IIRC) for the image, so you can install a bootloader in there. >>>> >>>> Hm, wondering if that's something to consider for qcow2v3, too... Do you >>>> think it's still useful when doing this on a cluster granularity? It >>>> would only work for well-aligned partitions then, but I think that >>>> shouldn't be a problem for current OSes. >>> >>> Well, we could always just hack around for bits where it overlaps. When >>> passing in a differently aligned partition for example, we could just >>> declare the odd sector as COW sector and copy the contents over :). Though >>> that might not be what the user really wants. Hrm. >>> >>>> Basically, additionally to the three cluster types "read from this >>>> image", "COW from backing file" and "zero cluster" we could introduce a >>>> fourth one "read/write to backing file". >>> >>> Yup, sounds very much straight forward! Then all we need is some tool to >>> create such a qcow file :) >> >> If we want to implement mini-device mapper why not do it as a separate >> BlockDriver? This could be useful for non-qcow2 cases like *safely* >> passing through a physical disk with a guarantee that you won't >> destroy the MBR. Also if we do it outside of an image format we don't >> need to worry about clusters and can do sector-granularity mapping. >> >> In fact, if we want mini-device mapper, that could be used to >> implement the VMDK multi-file support too. So if Fam writes a generic >> BlockDriver extent mapper we can use it from VMDK but also from >> command-line options that tie together qcow2, qed, raw, etc images. > > Does it really work for Alex' case, where you have some parts of an > image file that you want to be COW and other parts that write directly > to the backing file? > > Or to put it in a more general way: Does it work when you reference an > image more than once? Wouldn't you have to open the same image twice?
Here is an example of booting from a physical disk: [mbr][/dev/zero][/dev/sda] mbr is a COW image based on /dev/sda. /dev/zero is used to protect the first partition would be. The guest only sees zeroes and writes are ignored because the guest should never access this region. /dev/sda is the extent containing the second partition (actually we could just open /dev/sda2). Here we have the case that you mentioned with /dev/sda open as the read-only backing file for mbr and as read-write for the second partition. The question is are raw images safe for multiple opens when at least one is read-write? I think the answer for raw is yes. It is not safe to open non-raw image files multiple times. I'm also wondering if the -blockdev backing_file=<backing> option that has been discussed could be used in non-raw cases. Instead of opening backing files by name, specify the backing file block device on the command-line so that the same BlockDriverState is shared, avoiding inconsistencies. The multiple opener issue is orthogonal to device mapper support. Stefan