subject:"\[Qemu\-devel\] \[RFC\] qed\: Add QEMU Enhanced Disk format"

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-16 Thread Khoa Huynh

stefa...@linux.vnet.ibm.com wrote on Mon, 6 Sep 2010 11:04:38 +0100: > QEMU Enhanced Disk format is a disk image format that forgoes features > found in qcow2 in favor of better levels of performance and data > integrity. Due to its simpler on-disk layout, it is possible to safely > perform meta

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-14 Thread Anthony Liguori

On 09/14/2010 05:46 AM, Stefan Hajnoczi wrote: On Fri, Sep 10, 2010 at 10:22 PM, Jamie Lokier wrote: Stefan Hajnoczi wrote: Since there is no ordering imposed between the data write and metadata update, the following scenarios may occur on crash: 1. Neither data write nor metadata up

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-14 Thread Stefan Hajnoczi

On Tue, Sep 14, 2010 at 11:46 AM, Stefan Hajnoczi wrote: > Time to peek at md and dm to see how they safeguard metadata. Seems to me that dm-snap does not take measures to guard against snapshot metadata (exceptions) partial updates/corruption. I was hoping to find useful approaches there rather

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-14 Thread Stefan Hajnoczi

On Fri, Sep 10, 2010 at 10:22 PM, Jamie Lokier wrote: > Stefan Hajnoczi wrote: >> Since there is no ordering imposed between the data write and metadata >> update, the following scenarios may occur on crash: >> 1. Neither data write nor metadata update reach the disk. This is >> fine, qed metadat

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-13 Thread Kevin Wolf

Am 13.09.2010 15:07, schrieb Anthony Liguori: > On 09/13/2010 06:03 AM, Kevin Wolf wrote: >> >> The real reason why it's not the same story is that a qcow3 would be >> backwards compatible. Old images would just work as qcow3 by changing >> the version number in the header. Even if they are on a bl

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-13 Thread Anthony Liguori

On 09/13/2010 06:48 AM, Kevin Wolf wrote: Am 13.09.2010 13:34, schrieb Avi Kivity: On 09/13/2010 01:28 PM, Kevin Wolf wrote: Anytime you grow the freelist with qcow2, you have to write a brand new freelist table and update the metadata synchronously to point to a new versio

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-13 Thread Anthony Liguori

On 09/13/2010 06:28 AM, Kevin Wolf wrote: Anytime you grow the freelist with qcow2, you have to write a brand new freelist table and update the metadata synchronously to point to a new version of it. That means for a 1TB image, you're potentially writing out 128MB of data just to allocate a new

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-13 Thread Anthony Liguori

On 09/13/2010 06:03 AM, Kevin Wolf wrote: The real reason why it's not the same story is that a qcow3 would be backwards compatible. Old images would just work as qcow3 by changing the version number in the header. Even if they are on a block device. Even if they are encrypted. Even if they are

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-13 Thread Kevin Wolf

Am 13.09.2010 13:34, schrieb Avi Kivity: > On 09/13/2010 01:28 PM, Kevin Wolf wrote: >> >>> Anytime you grow the freelist with qcow2, you have to write a brand new >>> freelist table and update the metadata synchronously to point to a new >>> version of it. That means for a 1TB image, you're pot

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-13 Thread Avi Kivity

On 09/13/2010 01:28 PM, Kevin Wolf wrote: Anytime you grow the freelist with qcow2, you have to write a brand new freelist table and update the metadata synchronously to point to a new version of it. That means for a 1TB image, you're potentially writing out 128MB of data just to allocate a n

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-13 Thread Kevin Wolf

Am 12.09.2010 19:09, schrieb Anthony Liguori: > For a 1PB disk image with qcow2, the reference count table is 128GB. > For a 1TB image, the reference count table is 128MB. For a 128GB > image, the reference table is 16MB which is why we get away with it today. This is physical size. If you ha

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-13 Thread Kevin Wolf

Am 12.09.2010 17:56, schrieb Avi Kivity: To me, the biggest burden in qcow2 is thinking through how you deal with shared resources. Because you can block for a long period of time during write operations, it's not enough to just carry a mutex during all metadata operations.

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-13 Thread Kevin Wolf

Am 10.09.2010 21:33, schrieb Anthony Liguori: > On 09/10/2010 12:42 PM, Kevin Wolf wrote: >>> It bounces all buffers still and I still think it's synchronous >>> (although Kevin would know better). >>> >> Yes, it does bounce the buffers, though I'm looking into this anyway >> because you rais

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-13 Thread Avi Kivity

On 09/12/2010 10:18 PM, Anthony Liguori wrote: But since you have to boot before you can run any serious test, if it takes 5 seconds to do an fsck(), it's highly likely that it's not even noticeable. What if it takes 300 seconds? That means for a 1TB disk you're taking 500ms per L2 en

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-12 Thread Anthony Liguori

On 09/12/2010 12:51 PM, Avi Kivity wrote: On 09/12/2010 07:09 PM, Anthony Liguori wrote: On 09/12/2010 10:56 AM, Avi Kivity wrote: No, the worst case is 0.003% allocated disk, with the allocated clusters distributed uniformly. That means all your L2s are allocated, but almost none of your cl

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-12 Thread Avi Kivity

On 09/12/2010 07:09 PM, Anthony Liguori wrote: On 09/12/2010 10:56 AM, Avi Kivity wrote: No, the worst case is 0.003% allocated disk, with the allocated clusters distributed uniformly. That means all your L2s are allocated, but almost none of your clusters are. But in this case, you're so s

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-12 Thread Anthony Liguori

On 09/12/2010 10:56 AM, Avi Kivity wrote: No, the worst case is 0.003% allocated disk, with the allocated clusters distributed uniformly. That means all your L2s are allocated, but almost none of your clusters are. But in this case, you're so sparse that your metadata is pretty much co-locat

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-12 Thread Avi Kivity

On 09/12/2010 05:13 PM, Anthony Liguori wrote: On 09/12/2010 08:24 AM, Avi Kivity wrote: Not atexit, just when we close the image. Just a detail, but we need an atexit() handler to make sure block devices get closed because we have too many exit()s in the code today. Right. So when you

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-12 Thread Anthony Liguori

On 09/12/2010 08:24 AM, Avi Kivity wrote: Not atexit, just when we close the image. Just a detail, but we need an atexit() handler to make sure block devices get closed because we have too many exit()s in the code today. Right. So when you click the 'X' on the qemu window, we get to wait

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-12 Thread Avi Kivity

On 09/10/2010 08:07 PM, Anthony Liguori wrote: On 09/10/2010 10:49 AM, Avi Kivity wrote: If I do a qemu-img create -f qcow2 foo.img 10GB, and then do a naive copy of the image file and end up with a 2GB image when there's nothing in it, that's badness. Only if you crash in the middle. If

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Jamie Lokier

Stefan Hajnoczi wrote: > Since there is no ordering imposed between the data write and metadata > update, the following scenarios may occur on crash: > 1. Neither data write nor metadata update reach the disk. This is > fine, qed metadata has not been corrupted. > 2. Data reaches disk but metadata

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Anthony Liguori

On 09/10/2010 12:42 PM, Kevin Wolf wrote: It bounces all buffers still and I still think it's synchronous (although Kevin would know better). Yes, it does bounce the buffers, though I'm looking into this anyway because you raised concerns about unbounded allocations. (And it has been on my

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Miguel Di Ciurcio Filho

On Fri, Sep 10, 2010 at 2:10 PM, Anthony Liguori wrote: >> >> Well, snapshots have an ID today (which is different from their name). >> Nobody stops you from putting a UUID there. Fully backwards compatible, >> no feature flag needed. I think Miguel was planning to actually do this. >> > > The pro

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Kevin Wolf

Am 10.09.2010 19:10, schrieb Anthony Liguori: > On 09/10/2010 11:05 AM, Kevin Wolf wrote: >> Am 10.09.2010 17:53, schrieb Anthony Liguori: >> >>> On 09/10/2010 10:18 AM, Kevin Wolf wrote: >>> Am 10.09.2010 17:02, schrieb Anthony Liguori: > What makes us future p

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Kevin Wolf

Am 10.09.2010 19:07, schrieb Anthony Liguori: Sure, we'll support qcow2, but will we give it the same attention? >>> >>> We have a lot of block formats in QEMU today but only one block >>> format that actually performs well and has good data integrity. >>> >>> We're not giving qcow2 the atten

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Anthony Liguori

On 09/10/2010 11:05 AM, Kevin Wolf wrote: Am 10.09.2010 17:53, schrieb Anthony Liguori: On 09/10/2010 10:18 AM, Kevin Wolf wrote: Am 10.09.2010 17:02, schrieb Anthony Liguori: What makes us future proof is having a good feature support. qcow2 doesn't have this. We have a g

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Anthony Liguori

On 09/10/2010 10:49 AM, Avi Kivity wrote: If I do a qemu-img create -f qcow2 foo.img 10GB, and then do a naive copy of the image file and end up with a 2GB image when there's nothing in it, that's badness. Only if you crash in the middle. If not, you free the preallocation during shutdown

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Kevin Wolf

Am 10.09.2010 17:53, schrieb Anthony Liguori: > On 09/10/2010 10:18 AM, Kevin Wolf wrote: >> Am 10.09.2010 17:02, schrieb Anthony Liguori: >> >>> What makes us future proof is having a good feature support. qcow2 >>> doesn't have this. We have a good way at making purely informational >>> cha

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Anthony Liguori

On 09/10/2010 10:18 AM, Kevin Wolf wrote: Am 10.09.2010 17:02, schrieb Anthony Liguori: What makes us future proof is having a good feature support. qcow2 doesn't have this. We have a good way at making purely informational changes and also making changes that break the format. Those feat

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Avi Kivity

On 09/10/2010 05:56 PM, Anthony Liguori wrote: On 09/10/2010 08:47 AM, Avi Kivity wrote: The current qcow2 implementation, yes. The qcow2 format, no. The qcow2 format has more writes because it maintains more meta data. More writes == worse performance. You claim that you can effectively

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Kevin Wolf

Am 10.09.2010 17:02, schrieb Anthony Liguori: > What makes us future proof is having a good feature support. qcow2 > doesn't have this. We have a good way at making purely informational > changes and also making changes that break the format. Those features > are independent so they can be ba

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Anthony Liguori

On 09/10/2010 08:48 AM, Christoph Hellwig wrote: On Fri, Sep 10, 2010 at 08:22:14AM -0500, Anthony Liguori wrote: fsck will always be fast on qed because the metadata is small. For a 1PB image, there's 128MB worth of L2s if it's fully allocated (keeping in mind, that once you're fully alloc

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Anthony Liguori

On 09/10/2010 08:47 AM, Avi Kivity wrote: The current qcow2 implementation, yes. The qcow2 format, no. The qcow2 format has more writes because it maintains more meta data. More writes == worse performance. You claim that you can effectively batch those writes such that the worse performa

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Avi Kivity

On 09/10/2010 05:12 PM, Christoph Hellwig wrote: On Fri, Sep 10, 2010 at 05:05:16PM +0300, Avi Kivity wrote: Note that ATA allows simply ignoring TRIM requests that we can't handle, and if we don't set the bit that guarantees TRIMed regions to be zeroed we don't even have to zero out the region

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Christoph Hellwig

On Fri, Sep 10, 2010 at 05:05:16PM +0300, Avi Kivity wrote: > >Note that ATA allows simply ignoring TRIM requests that we can't handle, > >and if we don't set the bit that guarantees TRIMed regions to be zeroed > >we don't even have to zero out the regions. > > It would be nice to support it. TRI

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Avi Kivity

On 09/10/2010 04:16 PM, Anthony Liguori wrote: btw, despite being not properly designed, qcow2 is able to support TRIM. qed isn't able to, except by leaking clusters on shutdown. TRIM support is required unless you're okay with the image growing until it is no longer sparse (the lack of TRIM

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Avi Kivity

On 09/10/2010 04:47 PM, Christoph Hellwig wrote: On Fri, Sep 10, 2010 at 12:33:09PM +0100, Stefan Hajnoczi wrote: btw, despite being not properly designed, qcow2 is able to support TRIM. ?qed isn't able to, except by leaking clusters on shutdown. ?TRIM support is required unless you're okay wit

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Avi Kivity

On 09/10/2010 04:22 PM, Anthony Liguori wrote: Looks like it depends on fsck, which is not a good idea for large images. fsck will always be fast on qed because the metadata is small. For a 1PB image, there's 128MB worth of L2s if it's fully allocated It's 32,000 seeks. (keeping in mind,

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Avi Kivity

On 09/10/2010 04:39 PM, Anthony Liguori wrote: On 09/10/2010 07:47 AM, Avi Kivity wrote: Then, with a clean base that takes on board the lessons of existing formats it is much easier to innovate. Look at the image streaming, defragmentation, and trim ideas that are playing out right now. I th

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Christoph Hellwig

On Fri, Sep 10, 2010 at 08:22:14AM -0500, Anthony Liguori wrote: > fsck will always be fast on qed because the metadata is small. For a > 1PB image, there's 128MB worth of L2s if it's fully allocated (keeping > in mind, that once you're fully allocated, you'll never fsck again). If > you've go

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Christoph Hellwig

On Fri, Sep 10, 2010 at 08:39:21AM -0500, Anthony Liguori wrote: > You're hand waving to a dangerous degree here :-) > > TRIM in qcow2 would require the following sequence: > > 1) remove cluster from L2 table > 2) sync() > 3) reduce cluster reference count > 4) sync() > > TRIM needs to be fast s

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Kevin Wolf

Am 10.09.2010 14:35, schrieb Stefan Hajnoczi: > On Fri, Sep 10, 2010 at 1:12 PM, Kevin Wolf wrote: >> Am 10.09.2010 13:43, schrieb Stefan Hajnoczi: >>> By creating two code paths within qcow2. >> >> You're creating two code paths for users. > > No, I'm creating a single path: Q

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Avi Kivity

On 09/10/2010 04:14 PM, Anthony Liguori wrote: On 09/10/2010 06:14 AM, Avi Kivity wrote: The point of an image format is not to recreate btrfs in software. It's to provide a mechanism to allow users to move images around reasonable but once an image is present on a reasonable filesystem, w

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Christoph Hellwig

On Fri, Sep 10, 2010 at 12:33:09PM +0100, Stefan Hajnoczi wrote: > > btw, despite being not properly designed, qcow2 is able to support TRIM. > > ?qed isn't able to, except by leaking clusters on shutdown. ?TRIM support is > > required unless you're okay with the image growing until it is no longer

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Anthony Liguori

On 09/10/2010 07:47 AM, Avi Kivity wrote: Then, with a clean base that takes on board the lessons of existing formats it is much easier to innovate. Look at the image streaming, defragmentation, and trim ideas that are playing out right now. I think the reason we haven't seen them before is bec

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Anthony Liguori

On 09/10/2010 07:06 AM, Avi Kivity wrote: On 09/10/2010 02:43 PM, Stefan Hajnoczi wrote: and/or enterprise storage. That doesn't eliminate undiscovered errors (they can still come from the transport). Eliminating silent data corruption is currently not a goal for any disk image format I kn

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Anthony Liguori

On 09/10/2010 06:43 AM, Avi Kivity wrote: On 09/10/2010 02:33 PM, Stefan Hajnoczi wrote: btw, despite being not properly designed, qcow2 is able to support TRIM. qed isn't able to, except by leaking clusters on shutdown. TRIM support is required unless you're okay with the image growing u

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Avi Kivity

On 09/10/2010 04:10 PM, Stefan Hajnoczi wrote: On Fri, Sep 10, 2010 at 1:47 PM, Avi Kivity wrote: On 09/10/2010 03:35 PM, Stefan Hajnoczi wrote: That still leaves those qcow2 images that use features not supported by qed. Just a few features missing in qed are internal snapshots, qcow2 on b

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Anthony Liguori

On 09/10/2010 06:25 AM, Avi Kivity wrote: On 09/10/2010 02:14 PM, Avi Kivity wrote: qcow2 is not a properly designed image format. It was a weekend hacking session from Fabrice that he dropped in the code base and never really finished doing what he originally intended. The improvements

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Anthony Liguori

On 09/10/2010 06:14 AM, Avi Kivity wrote: The point of an image format is not to recreate btrfs in software. It's to provide a mechanism to allow users to move images around reasonable but once an image is present on a reasonable filesystem, we should more or less get the heck out of the way

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Stefan Hajnoczi

On Fri, Sep 10, 2010 at 1:47 PM, Avi Kivity wrote: > On 09/10/2010 03:35 PM, Stefan Hajnoczi wrote: >> >>> That still leaves those qcow2 images that use features not supported by >>> qed. Just a few features missing in qed are internal snapshots, qcow2 on >>> block devices, compression, encryptio

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Avi Kivity

On 09/10/2010 03:35 PM, Stefan Hajnoczi wrote: That still leaves those qcow2 images that use features not supported by qed. Just a few features missing in qed are internal snapshots, qcow2 on block devices, compression, encryption. So qed can't be a complete replacement for qcow2 (and that was

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Stefan Hajnoczi

On Fri, Sep 10, 2010 at 1:12 PM, Kevin Wolf wrote: > Am 10.09.2010 13:43, schrieb Stefan Hajnoczi: >> By creating two code paths within qcow2. > > You're creating two code paths for users. No, I'm creating a single path: QED. There are already two code paths: raw an

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Kevin Wolf

Am 10.09.2010 13:43, schrieb Stefan Hajnoczi: > By creating two code paths within qcow2. You're creating two code paths for users. >>> >>> No, I'm creating a single path: QED. >>> >>> There are already two code paths: raw and qcow2. qcow2 has had such a bad >>> history that for a lot

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Avi Kivity

On 09/10/2010 02:43 PM, Stefan Hajnoczi wrote: and/or enterprise storage. That doesn't eliminate undiscovered errors (they can still come from the transport). Eliminating silent data corruption is currently not a goal for any disk image format I know of. For filesystems, I know that ZFS and

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Avi Kivity

On 09/10/2010 02:33 PM, Stefan Hajnoczi wrote: btw, despite being not properly designed, qcow2 is able to support TRIM. qed isn't able to, except by leaking clusters on shutdown. TRIM support is required unless you're okay with the image growing until it is no longer sparse (the lack of TRI

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Stefan Hajnoczi

On Fri, Sep 10, 2010 at 12:14 PM, Avi Kivity wrote: > On 09/09/2010 03:49 PM, Anthony Liguori wrote: >> >> On 09/09/2010 01:45 AM, Avi Kivity wrote: >>> >>> Loading very large L2 tables on demand will result in very long >>> latencies. Increasing cluster size will result in very long first write

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Avi Kivity

On 09/10/2010 02:29 PM, Stefan Hajnoczi wrote: They only guarantee that the filesystem is consistent. A write() that extends a file may be reordered with the L2 write() that references the new cluster. Requiring fsck on unclean shutdown is very backwards for a 2010 format. I'm interested i

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Stefan Hajnoczi

On Fri, Sep 10, 2010 at 12:22 PM, Avi Kivity wrote: > On 09/09/2010 08:43 PM, Anthony Liguori wrote: >>> >>> Hm, we do have a use case for qcow2-over-lvm. I can't say it's something >>> I like, but a point to consider. >> >> >> We specifically are not supporting that use-case in QED today. Ther

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Stefan Hajnoczi

On Fri, Sep 10, 2010 at 12:25 PM, Avi Kivity wrote: > On 09/10/2010 02:14 PM, Avi Kivity wrote: >> >>> >>> qcow2 is not a properly designed image format. It was a weekend hacking >>> session from Fabrice that he dropped in the code base and never really >>> finished doing what he originally inte

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Avi Kivity

On 09/10/2010 02:14 PM, Avi Kivity wrote: qcow2 is not a properly designed image format. It was a weekend hacking session from Fabrice that he dropped in the code base and never really finished doing what he originally intended. The improvements that have been made to it are almost at th

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Avi Kivity

On 09/09/2010 08:43 PM, Anthony Liguori wrote: Hm, we do have a use case for qcow2-over-lvm. I can't say it's something I like, but a point to consider. We specifically are not supporting that use-case in QED today. There's a good reason for it. For cluster allocation, we achieve good pe

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Avi Kivity

On 09/10/2010 12:01 AM, Christoph Hellwig wrote: On Thu, Sep 09, 2010 at 09:24:26AM +0300, Avi Kivity wrote: The other thing we can do is defragment the logical image, then defragment the underlying file (if the filesystem supports it, issue the appropriate ioctl, otherwise defragment to a new

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-10 Thread Avi Kivity

On 09/09/2010 03:49 PM, Anthony Liguori wrote: On 09/09/2010 01:45 AM, Avi Kivity wrote: Loading very large L2 tables on demand will result in very long latencies. Increasing cluster size will result in very long first write latencies. Adding an extra level results in an extra random write

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-09 Thread Christoph Hellwig

On Thu, Sep 09, 2010 at 09:24:26AM +0300, Avi Kivity wrote: > The other thing we can do is defragment the logical image, then > defragment the underlying file (if the filesystem supports it, issue the > appropriate ioctl, otherwise defragment to a new file which you write > linearly). What's wh

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-09 Thread Christoph Hellwig

On Thu, Sep 09, 2010 at 12:43:28PM -0500, Anthony Liguori wrote: > Define "very large disks". > > My target for VM images is 100GB-1TB. Practically speaking, that at > least covers us for the next 5 years. We have 2TB SATA disks shipping already, and people tend to produce more and more "data".

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-09 Thread Anthony Liguori

On 09/09/2010 01:59 AM, Avi Kivity wrote: On 09/08/2010 06:07 PM, Stefan Hajnoczi wrote: uint32_t table_size; /* table size, in clusters */ Presumably L1 table size? Or any table size? Hm. It would be nicer not to require contiguous sectors anywhere. How about a variable- or

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-09 Thread Anthony Liguori

On 09/09/2010 01:45 AM, Avi Kivity wrote: Loading very large L2 tables on demand will result in very long latencies. Increasing cluster size will result in very long first write latencies. Adding an extra level results in an extra random write every 4TB. It would be trivially easy to add an

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-09 Thread Avi Kivity

On 09/08/2010 06:07 PM, Stefan Hajnoczi wrote: uint32_t table_size; /* table size, in clusters */ Presumably L1 table size? Or any table size? Hm. It would be nicer not to require contiguous sectors anywhere. How about a variable- or fixed-height tree? Both extents and fancie

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Avi Kivity

On 09/08/2010 02:15 PM, Stefan Hajnoczi wrote: 3. Metadata update reaches disk but data does not. The interesting case! The L2 table now points to a cluster which is beyond the last cluster in the image file. Remember that file size is rounded down by cluster size, so partial data writes are

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Avi Kivity

On 09/09/2010 09:45 AM, Avi Kivity wrote: A new format doesn't introduce much additional complexity. We provide image conversion tool and we can almost certainly provide an in-place conversion tool that makes the process very fast. It requires users to make a decision. By the time qed is

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Avi Kivity

On 09/08/2010 03:48 PM, Anthony Liguori wrote: On 09/08/2010 03:23 AM, Avi Kivity wrote: On 09/08/2010 01:27 AM, Anthony Liguori wrote: FWIW, L2s are 256K at the moment and with a two level table, it can support 5PB of data. I clearly suck at basic math today. The image supports 64TB toda

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Avi Kivity

On 09/09/2010 05:35 AM, Christoph Hellwig wrote: On Wed, Sep 08, 2010 at 03:28:50PM -0500, Anthony Liguori wrote: That's a good point. Is there a reasonable way to do this cooperatively with the underlying filesystem? The only thing we can do easily is to try to use as large as possible exten

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Avi Kivity

On 09/08/2010 03:55 PM, Anthony Liguori wrote: (3 levels) Dunno, just seems more regular to me. Image resize doesn't need to relocate the L2 table in case it overflows. The overhead from three levels is an extra table, which is negligible. It means an extra I/O request in the degenerate

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Christoph Hellwig

On Wed, Sep 08, 2010 at 03:28:50PM -0500, Anthony Liguori wrote: > That's a good point. Is there a reasonable way to do this cooperatively > with the underlying filesystem? The only thing we can do easily is to try to use as large as possible extents in the allocation. Once we're at a cuple Meg

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Anthony Liguori

On 09/08/2010 03:23 PM, Christoph Hellwig wrote: On Wed, Sep 08, 2010 at 11:30:10AM -0500, Anthony Liguori wrote: http://wiki.qemu.org/Features/QED/OnlineDefrag Is a spec for a very simple approach to online defrag that I hope we can implement in the near future. I think that once we have

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Christoph Hellwig

On Wed, Sep 08, 2010 at 11:30:10AM -0500, Anthony Liguori wrote: > http://wiki.qemu.org/Features/QED/OnlineDefrag > > Is a spec for a very simple approach to online defrag that I hope we can > implement in the near future. I think that once we have the mechanisms > to freeze clusters and to swa

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Anthony Liguori

On 09/08/2010 01:56 PM, Blue Swirl wrote: That's a bit big, for example CD images are only 640M and there were smaller disks. But I guess you mean the smallest maximum size limited by the cluster_size etc, so the actual images may be even smaller. Yes. The smallest image is one cluster. T

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Blue Swirl

On Wed, Sep 8, 2010 at 6:35 PM, Anthony Liguori wrote: > On 09/08/2010 01:24 PM, Blue Swirl wrote: >> >> Based on these: >> #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t)) >> header.image_size<= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size, >> the maximum image size

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Anthony Liguori

On 09/08/2010 01:24 PM, Blue Swirl wrote: Based on these: #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t)) header.image_size<= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size, the maximum image size equals to table_size^2 * cluster_size^3 / sizeof(uint64_t)^2. Is the sq

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Blue Swirl

On Wed, Sep 8, 2010 at 3:37 PM, Stefan Hajnoczi wrote: > On Tue, Sep 7, 2010 at 8:25 PM, Blue Swirl wrote: >> On Mon, Sep 6, 2010 at 10:04 AM, Stefan Hajnoczi >> wrote: >>> QEMU Enhanced Disk format is a disk image format that forgoes features >>> found in qcow2 in favor of better levels of perf

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Anthony Liguori

On 09/08/2010 10:38 AM, Christoph Hellwig wrote: On Wed, Sep 08, 2010 at 12:15:13PM +0100, Stefan Hajnoczi wrote: In-place writes overwrite old data in the image file. They do not allocate new clusters or update any metadata. This is why write performance is comparable to raw in the long r

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Christoph Hellwig

On Wed, Sep 08, 2010 at 12:15:13PM +0100, Stefan Hajnoczi wrote: > In-place writes overwrite old data in the image file. They do not > allocate new clusters or update any metadata. This is why write > performance is comparable to raw in the long run. Only if qed doesn't cause additional fragment

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Stefan Hajnoczi

On Tue, Sep 7, 2010 at 8:25 PM, Blue Swirl wrote: > On Mon, Sep 6, 2010 at 10:04 AM, Stefan Hajnoczi > wrote: >> QEMU Enhanced Disk format is a disk image format that forgoes features >> found in qcow2 in favor of better levels of performance and data >> integrity. Due to its simpler on-disk lay

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Stefan Hajnoczi

On Tue, Sep 7, 2010 at 3:51 PM, Avi Kivity wrote: > On 09/06/2010 04:06 PM, Anthony Liguori wrote: >> >> Another point worth mentioning is that our intention is to have a formal >> specification of the format before merging. A start of that is located at >> http://wiki.qemu.org/Features/QED >> >

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Kevin Wolf

Am 08.09.2010 15:26, schrieb Anthony Liguori: > On 09/08/2010 08:20 AM, Kevin Wolf wrote: >> Am 08.09.2010 14:48, schrieb Anthony Liguori: > I think one of the critical flaws in qcow2 was trying to invent a > better filesystem within qemu instead of just sticking to a very > simple and

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Anthony Liguori

On 09/08/2010 08:20 AM, Kevin Wolf wrote: Am 08.09.2010 14:48, schrieb Anthony Liguori: On 09/08/2010 03:23 AM, Avi Kivity wrote: On 09/08/2010 01:27 AM, Anthony Liguori wrote: FWIW, L2s are 256K at the moment and with a two level table, it can support 5PB of data.

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Kevin Wolf

Am 08.09.2010 14:48, schrieb Anthony Liguori: > On 09/08/2010 03:23 AM, Avi Kivity wrote: >> On 09/08/2010 01:27 AM, Anthony Liguori wrote: FWIW, L2s are 256K at the moment and with a two level table, it can support 5PB of data. >>> >>> >>> I clearly suck at basic math today. The image

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Anthony Liguori

On 09/08/2010 03:53 AM, Avi Kivity wrote: On 09/08/2010 11:41 AM, Alexander Graf wrote: On 08.09.2010, at 10:23, Avi Kivity wrote: On 09/08/2010 01:27 AM, Anthony Liguori wrote: FWIW, L2s are 256K at the moment and with a two level table, it can support 5PB of data. I clearly suck at basic

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Anthony Liguori

On 09/08/2010 03:23 AM, Avi Kivity wrote: On 09/08/2010 01:27 AM, Anthony Liguori wrote: FWIW, L2s are 256K at the moment and with a two level table, it can support 5PB of data. I clearly suck at basic math today. The image supports 64TB today. Dropping to 128K tables would reduce it to 1

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Stefan Hajnoczi

Here is a summary of how qed images can be accessed safely after a crash or power loss. First off, we only need to consider write operations since read operations do not change the state of the image file and cannot lead to metadata corruption. There are two types of writes. Allocating writes wh

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Avi Kivity

On 09/08/2010 11:41 AM, Alexander Graf wrote: On 08.09.2010, at 10:23, Avi Kivity wrote: On 09/08/2010 01:27 AM, Anthony Liguori wrote: FWIW, L2s are 256K at the moment and with a two level table, it can support 5PB of data. I clearly suck at basic math today. The image supports 64TB toda

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Alexander Graf

On 08.09.2010, at 10:23, Avi Kivity wrote: > On 09/08/2010 01:27 AM, Anthony Liguori wrote: >>> FWIW, L2s are 256K at the moment and with a two level table, it can support >>> 5PB of data. >> >> >> I clearly suck at basic math today. The image supports 64TB today. >> Dropping to 128K tables

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Avi Kivity

On 09/08/2010 01:27 AM, Anthony Liguori wrote: FWIW, L2s are 256K at the moment and with a two level table, it can support 5PB of data. I clearly suck at basic math today. The image supports 64TB today. Dropping to 128K tables would reduce it to 16TB and 64k tables would be 4TB. Maybe w

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-08 Thread Kevin Wolf

Am 07.09.2010 22:41, schrieb Anthony Liguori: > There's two types of snapshots that I think can cause confusion. > There's CPU/device state snapshots and then there's a block device snapshot. > > qcow2 and qed both support block device snapshots. qed only supports > external snapshots (via bac

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-07 Thread Christoph Hellwig

On Tue, Sep 07, 2010 at 05:29:53PM -0500, Anthony Liguori wrote: > If it were just one bit for just raw or not raw, wouldn't that be enough? > > Everything that isn't raw can be probed reliably so we really only need > to distinguish between things that are probe-able and things that are > not p

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-07 Thread Anthony Liguori

On 09/07/2010 04:35 PM, Christoph Hellwig wrote: On Tue, Sep 07, 2010 at 11:12:15AM -0500, Anthony Liguori wrote: IOW, what are valid values for backing_fmt? "raw" and "qed" are obvious but what does it mean from a formal specification perspective to have "vmdk"? Is that VMDK v3 or v4, wha

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-07 Thread Anthony Liguori

On 09/07/2010 11:25 AM, Anthony Liguori wrote: On 09/07/2010 11:09 AM, Avi Kivity wrote: On 09/07/2010 06:40 PM, Anthony Liguori wrote: Need a checksum for the header. Is that not a bit overkill for what we're doing? What's the benefit? Make sure we're not looking at a header write inter

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-07 Thread Christoph Hellwig

On Tue, Sep 07, 2010 at 11:12:15AM -0500, Anthony Liguori wrote: > IOW, what are valid values for backing_fmt? "raw" and "qed" are obvious > but what does it mean from a formal specification perspective to have > "vmdk"? Is that VMDK v3 or v4, what if there's a v5? It might be better to just u

Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format

2010-09-07 Thread Anthony Liguori

On 09/07/2010 02:25 PM, Blue Swirl wrote: On Mon, Sep 6, 2010 at 10:04 AM, Stefan Hajnoczi wrote: QEMU Enhanced Disk format is a disk image format that forgoes features found in qcow2 in favor of better levels of performance and data integrity. Due to its simpler on-disk layout, it is pos

1 2 >

1 - 100 of 117 matches

Mail list logo