stefa...@linux.vnet.ibm.com wrote on Mon, 6 Sep 2010 11:04:38 +0100:
> QEMU Enhanced Disk format is a disk image format that forgoes features
> found in qcow2 in favor of better levels of performance and data
> integrity. Due to its simpler on-disk layout, it is possible to safely
> perform meta
On 09/14/2010 05:46 AM, Stefan Hajnoczi wrote:
On Fri, Sep 10, 2010 at 10:22 PM, Jamie Lokier wrote:
Stefan Hajnoczi wrote:
Since there is no ordering imposed between the data write and metadata
update, the following scenarios may occur on crash:
1. Neither data write nor metadata up
On Tue, Sep 14, 2010 at 11:46 AM, Stefan Hajnoczi wrote:
> Time to peek at md and dm to see how they safeguard metadata.
Seems to me that dm-snap does not take measures to guard against
snapshot metadata (exceptions) partial updates/corruption. I was
hoping to find useful approaches there rather
On Fri, Sep 10, 2010 at 10:22 PM, Jamie Lokier wrote:
> Stefan Hajnoczi wrote:
>> Since there is no ordering imposed between the data write and metadata
>> update, the following scenarios may occur on crash:
>> 1. Neither data write nor metadata update reach the disk. This is
>> fine, qed metadat
Am 13.09.2010 15:07, schrieb Anthony Liguori:
> On 09/13/2010 06:03 AM, Kevin Wolf wrote:
>>
>> The real reason why it's not the same story is that a qcow3 would be
>> backwards compatible. Old images would just work as qcow3 by changing
>> the version number in the header. Even if they are on a bl
On 09/13/2010 06:48 AM, Kevin Wolf wrote:
Am 13.09.2010 13:34, schrieb Avi Kivity:
On 09/13/2010 01:28 PM, Kevin Wolf wrote:
Anytime you grow the freelist with qcow2, you have to write a brand new
freelist table and update the metadata synchronously to point to a new
versio
On 09/13/2010 06:28 AM, Kevin Wolf wrote:
Anytime you grow the freelist with qcow2, you have to write a brand new
freelist table and update the metadata synchronously to point to a new
version of it. That means for a 1TB image, you're potentially writing
out 128MB of data just to allocate a new
On 09/13/2010 06:03 AM, Kevin Wolf wrote:
The real reason why it's not the same story is that a qcow3 would be
backwards compatible. Old images would just work as qcow3 by changing
the version number in the header. Even if they are on a block device.
Even if they are encrypted. Even if they are
Am 13.09.2010 13:34, schrieb Avi Kivity:
> On 09/13/2010 01:28 PM, Kevin Wolf wrote:
>>
>>> Anytime you grow the freelist with qcow2, you have to write a brand new
>>> freelist table and update the metadata synchronously to point to a new
>>> version of it. That means for a 1TB image, you're pot
On 09/13/2010 01:28 PM, Kevin Wolf wrote:
Anytime you grow the freelist with qcow2, you have to write a brand new
freelist table and update the metadata synchronously to point to a new
version of it. That means for a 1TB image, you're potentially writing
out 128MB of data just to allocate a n
Am 12.09.2010 19:09, schrieb Anthony Liguori:
> For a 1PB disk image with qcow2, the reference count table is 128GB.
> For a 1TB image, the reference count table is 128MB. For a 128GB
> image, the reference table is 16MB which is why we get away with it today.
This is physical size. If you ha
Am 12.09.2010 17:56, schrieb Avi Kivity:
To me, the biggest burden in qcow2 is thinking through how you deal
with shared resources. Because you can block for a long period of
time during write operations, it's not enough to just carry a mutex
during all metadata operations.
Am 10.09.2010 21:33, schrieb Anthony Liguori:
> On 09/10/2010 12:42 PM, Kevin Wolf wrote:
>>> It bounces all buffers still and I still think it's synchronous
>>> (although Kevin would know better).
>>>
>> Yes, it does bounce the buffers, though I'm looking into this anyway
>> because you rais
On 09/12/2010 10:18 PM, Anthony Liguori wrote:
But since you have to boot before you can run any serious test, if
it takes 5 seconds to do an fsck(), it's highly likely that it's
not even noticeable.
What if it takes 300 seconds?
That means for a 1TB disk you're taking 500ms per L2 en
On 09/12/2010 12:51 PM, Avi Kivity wrote:
On 09/12/2010 07:09 PM, Anthony Liguori wrote:
On 09/12/2010 10:56 AM, Avi Kivity wrote:
No, the worst case is 0.003% allocated disk, with the allocated
clusters distributed uniformly. That means all your L2s are
allocated, but almost none of your cl
On 09/12/2010 07:09 PM, Anthony Liguori wrote:
On 09/12/2010 10:56 AM, Avi Kivity wrote:
No, the worst case is 0.003% allocated disk, with the allocated
clusters distributed uniformly. That means all your L2s are
allocated, but almost none of your clusters are.
But in this case, you're so s
On 09/12/2010 10:56 AM, Avi Kivity wrote:
No, the worst case is 0.003% allocated disk, with the allocated
clusters distributed uniformly. That means all your L2s are
allocated, but almost none of your clusters are.
But in this case, you're so sparse that your metadata is pretty much
co-locat
On 09/12/2010 05:13 PM, Anthony Liguori wrote:
On 09/12/2010 08:24 AM, Avi Kivity wrote:
Not atexit, just when we close the image.
Just a detail, but we need an atexit() handler to make sure block
devices get closed because we have too many exit()s in the code today.
Right.
So when you
On 09/12/2010 08:24 AM, Avi Kivity wrote:
Not atexit, just when we close the image.
Just a detail, but we need an atexit() handler to make sure block
devices get closed because we have too many exit()s in the code today.
Right.
So when you click the 'X' on the qemu window, we get to wait
On 09/10/2010 08:07 PM, Anthony Liguori wrote:
On 09/10/2010 10:49 AM, Avi Kivity wrote:
If I do a qemu-img create -f qcow2 foo.img 10GB, and then do a
naive copy of the image file and end up with a 2GB image when
there's nothing in it, that's badness.
Only if you crash in the middle. If
Stefan Hajnoczi wrote:
> Since there is no ordering imposed between the data write and metadata
> update, the following scenarios may occur on crash:
> 1. Neither data write nor metadata update reach the disk. This is
> fine, qed metadata has not been corrupted.
> 2. Data reaches disk but metadata
On 09/10/2010 12:42 PM, Kevin Wolf wrote:
It bounces all buffers still and I still think it's synchronous
(although Kevin would know better).
Yes, it does bounce the buffers, though I'm looking into this anyway
because you raised concerns about unbounded allocations. (And it has
been on my
On Fri, Sep 10, 2010 at 2:10 PM, Anthony Liguori wrote:
>>
>> Well, snapshots have an ID today (which is different from their name).
>> Nobody stops you from putting a UUID there. Fully backwards compatible,
>> no feature flag needed. I think Miguel was planning to actually do this.
>>
>
> The pro
Am 10.09.2010 19:10, schrieb Anthony Liguori:
> On 09/10/2010 11:05 AM, Kevin Wolf wrote:
>> Am 10.09.2010 17:53, schrieb Anthony Liguori:
>>
>>> On 09/10/2010 10:18 AM, Kevin Wolf wrote:
>>>
Am 10.09.2010 17:02, schrieb Anthony Liguori:
> What makes us future p
Am 10.09.2010 19:07, schrieb Anthony Liguori:
Sure, we'll support qcow2, but will we give it the same attention?
>>>
>>> We have a lot of block formats in QEMU today but only one block
>>> format that actually performs well and has good data integrity.
>>>
>>> We're not giving qcow2 the atten
On 09/10/2010 11:05 AM, Kevin Wolf wrote:
Am 10.09.2010 17:53, schrieb Anthony Liguori:
On 09/10/2010 10:18 AM, Kevin Wolf wrote:
Am 10.09.2010 17:02, schrieb Anthony Liguori:
What makes us future proof is having a good feature support. qcow2
doesn't have this. We have a g
On 09/10/2010 10:49 AM, Avi Kivity wrote:
If I do a qemu-img create -f qcow2 foo.img 10GB, and then do a
naive copy of the image file and end up with a 2GB image when there's
nothing in it, that's badness.
Only if you crash in the middle. If not, you free the preallocation
during shutdown
Am 10.09.2010 17:53, schrieb Anthony Liguori:
> On 09/10/2010 10:18 AM, Kevin Wolf wrote:
>> Am 10.09.2010 17:02, schrieb Anthony Liguori:
>>
>>> What makes us future proof is having a good feature support. qcow2
>>> doesn't have this. We have a good way at making purely informational
>>> cha
On 09/10/2010 10:18 AM, Kevin Wolf wrote:
Am 10.09.2010 17:02, schrieb Anthony Liguori:
What makes us future proof is having a good feature support. qcow2
doesn't have this. We have a good way at making purely informational
changes and also making changes that break the format. Those feat
On 09/10/2010 05:56 PM, Anthony Liguori wrote:
On 09/10/2010 08:47 AM, Avi Kivity wrote:
The current qcow2 implementation, yes. The qcow2 format, no.
The qcow2 format has more writes because it maintains more meta data.
More writes == worse performance.
You claim that you can effectively
Am 10.09.2010 17:02, schrieb Anthony Liguori:
> What makes us future proof is having a good feature support. qcow2
> doesn't have this. We have a good way at making purely informational
> changes and also making changes that break the format. Those features
> are independent so they can be ba
On 09/10/2010 08:48 AM, Christoph Hellwig wrote:
On Fri, Sep 10, 2010 at 08:22:14AM -0500, Anthony Liguori wrote:
fsck will always be fast on qed because the metadata is small. For a
1PB image, there's 128MB worth of L2s if it's fully allocated (keeping
in mind, that once you're fully alloc
On 09/10/2010 08:47 AM, Avi Kivity wrote:
The current qcow2 implementation, yes. The qcow2 format, no.
The qcow2 format has more writes because it maintains more meta data.
More writes == worse performance.
You claim that you can effectively batch those writes such that the
worse performa
On 09/10/2010 05:12 PM, Christoph Hellwig wrote:
On Fri, Sep 10, 2010 at 05:05:16PM +0300, Avi Kivity wrote:
Note that ATA allows simply ignoring TRIM requests that we can't handle,
and if we don't set the bit that guarantees TRIMed regions to be zeroed
we don't even have to zero out the region
On Fri, Sep 10, 2010 at 05:05:16PM +0300, Avi Kivity wrote:
> >Note that ATA allows simply ignoring TRIM requests that we can't handle,
> >and if we don't set the bit that guarantees TRIMed regions to be zeroed
> >we don't even have to zero out the regions.
>
> It would be nice to support it. TRI
On 09/10/2010 04:16 PM, Anthony Liguori wrote:
btw, despite being not properly designed, qcow2 is able to support
TRIM. qed isn't able to, except by leaking clusters on shutdown.
TRIM support is required unless you're okay with the image growing
until it is no longer sparse (the lack of TRIM
On 09/10/2010 04:47 PM, Christoph Hellwig wrote:
On Fri, Sep 10, 2010 at 12:33:09PM +0100, Stefan Hajnoczi wrote:
btw, despite being not properly designed, qcow2 is able to support TRIM.
?qed isn't able to, except by leaking clusters on shutdown. ?TRIM support is
required unless you're okay wit
On 09/10/2010 04:22 PM, Anthony Liguori wrote:
Looks like it depends on fsck, which is not a good idea for large
images.
fsck will always be fast on qed because the metadata is small. For a
1PB image, there's 128MB worth of L2s if it's fully allocated
It's 32,000 seeks.
(keeping in mind,
On 09/10/2010 04:39 PM, Anthony Liguori wrote:
On 09/10/2010 07:47 AM, Avi Kivity wrote:
Then, with a clean base that takes on board the lessons of existing
formats it is much easier to innovate. Look at the image streaming,
defragmentation, and trim ideas that are playing out right now. I
th
On Fri, Sep 10, 2010 at 08:22:14AM -0500, Anthony Liguori wrote:
> fsck will always be fast on qed because the metadata is small. For a
> 1PB image, there's 128MB worth of L2s if it's fully allocated (keeping
> in mind, that once you're fully allocated, you'll never fsck again). If
> you've go
On Fri, Sep 10, 2010 at 08:39:21AM -0500, Anthony Liguori wrote:
> You're hand waving to a dangerous degree here :-)
>
> TRIM in qcow2 would require the following sequence:
>
> 1) remove cluster from L2 table
> 2) sync()
> 3) reduce cluster reference count
> 4) sync()
>
> TRIM needs to be fast s
Am 10.09.2010 14:35, schrieb Stefan Hajnoczi:
> On Fri, Sep 10, 2010 at 1:12 PM, Kevin Wolf wrote:
>> Am 10.09.2010 13:43, schrieb Stefan Hajnoczi:
>>> By creating two code paths within qcow2.
>>
>> You're creating two code paths for users.
>
> No, I'm creating a single path: Q
On 09/10/2010 04:14 PM, Anthony Liguori wrote:
On 09/10/2010 06:14 AM, Avi Kivity wrote:
The point of an image format is not to recreate btrfs in software.
It's to provide a mechanism to allow users to move images around
reasonable but once an image is present on a reasonable filesystem,
w
On Fri, Sep 10, 2010 at 12:33:09PM +0100, Stefan Hajnoczi wrote:
> > btw, despite being not properly designed, qcow2 is able to support TRIM.
> > ?qed isn't able to, except by leaking clusters on shutdown. ?TRIM support is
> > required unless you're okay with the image growing until it is no longer
On 09/10/2010 07:47 AM, Avi Kivity wrote:
Then, with a clean base that takes on board the lessons of existing
formats it is much easier to innovate. Look at the image streaming,
defragmentation, and trim ideas that are playing out right now. I
think the reason we haven't seen them before is bec
On 09/10/2010 07:06 AM, Avi Kivity wrote:
On 09/10/2010 02:43 PM, Stefan Hajnoczi wrote:
and/or enterprise storage.
That doesn't eliminate undiscovered errors (they can still come from
the
transport).
Eliminating silent data corruption is currently not a goal for any
disk image format I kn
On 09/10/2010 06:43 AM, Avi Kivity wrote:
On 09/10/2010 02:33 PM, Stefan Hajnoczi wrote:
btw, despite being not properly designed, qcow2 is able to support
TRIM.
qed isn't able to, except by leaking clusters on shutdown. TRIM
support is
required unless you're okay with the image growing u
On 09/10/2010 04:10 PM, Stefan Hajnoczi wrote:
On Fri, Sep 10, 2010 at 1:47 PM, Avi Kivity wrote:
On 09/10/2010 03:35 PM, Stefan Hajnoczi wrote:
That still leaves those qcow2 images that use features not supported by
qed. Just a few features missing in qed are internal snapshots, qcow2 on
b
On 09/10/2010 06:25 AM, Avi Kivity wrote:
On 09/10/2010 02:14 PM, Avi Kivity wrote:
qcow2 is not a properly designed image format. It was a weekend
hacking session from Fabrice that he dropped in the code base and
never really finished doing what he originally intended. The
improvements
On 09/10/2010 06:14 AM, Avi Kivity wrote:
The point of an image format is not to recreate btrfs in software.
It's to provide a mechanism to allow users to move images around
reasonable but once an image is present on a reasonable filesystem,
we should more or less get the heck out of the way
On Fri, Sep 10, 2010 at 1:47 PM, Avi Kivity wrote:
> On 09/10/2010 03:35 PM, Stefan Hajnoczi wrote:
>>
>>> That still leaves those qcow2 images that use features not supported by
>>> qed. Just a few features missing in qed are internal snapshots, qcow2 on
>>> block devices, compression, encryptio
On 09/10/2010 03:35 PM, Stefan Hajnoczi wrote:
That still leaves those qcow2 images that use features not supported by
qed. Just a few features missing in qed are internal snapshots, qcow2 on
block devices, compression, encryption. So qed can't be a complete
replacement for qcow2 (and that was
On Fri, Sep 10, 2010 at 1:12 PM, Kevin Wolf wrote:
> Am 10.09.2010 13:43, schrieb Stefan Hajnoczi:
>> By creating two code paths within qcow2.
>
> You're creating two code paths for users.
No, I'm creating a single path: QED.
There are already two code paths: raw an
Am 10.09.2010 13:43, schrieb Stefan Hajnoczi:
> By creating two code paths within qcow2.
You're creating two code paths for users.
>>>
>>> No, I'm creating a single path: QED.
>>>
>>> There are already two code paths: raw and qcow2. qcow2 has had such a bad
>>> history that for a lot
On 09/10/2010 02:43 PM, Stefan Hajnoczi wrote:
and/or enterprise storage.
That doesn't eliminate undiscovered errors (they can still come from the
transport).
Eliminating silent data corruption is currently not a goal for any
disk image format I know of. For filesystems, I know that ZFS and
On 09/10/2010 02:33 PM, Stefan Hajnoczi wrote:
btw, despite being not properly designed, qcow2 is able to support TRIM.
qed isn't able to, except by leaking clusters on shutdown. TRIM support is
required unless you're okay with the image growing until it is no longer
sparse (the lack of TRI
On Fri, Sep 10, 2010 at 12:14 PM, Avi Kivity wrote:
> On 09/09/2010 03:49 PM, Anthony Liguori wrote:
>>
>> On 09/09/2010 01:45 AM, Avi Kivity wrote:
>>>
>>> Loading very large L2 tables on demand will result in very long
>>> latencies. Increasing cluster size will result in very long first write
On 09/10/2010 02:29 PM, Stefan Hajnoczi wrote:
They only guarantee that the filesystem is consistent. A write() that
extends a file may be reordered with the L2 write() that references the new
cluster. Requiring fsck on unclean shutdown is very backwards for a 2010
format.
I'm interested i
On Fri, Sep 10, 2010 at 12:22 PM, Avi Kivity wrote:
> On 09/09/2010 08:43 PM, Anthony Liguori wrote:
>>>
>>> Hm, we do have a use case for qcow2-over-lvm. I can't say it's something
>>> I like, but a point to consider.
>>
>>
>> We specifically are not supporting that use-case in QED today. Ther
On Fri, Sep 10, 2010 at 12:25 PM, Avi Kivity wrote:
> On 09/10/2010 02:14 PM, Avi Kivity wrote:
>>
>>>
>>> qcow2 is not a properly designed image format. It was a weekend hacking
>>> session from Fabrice that he dropped in the code base and never really
>>> finished doing what he originally inte
On 09/10/2010 02:14 PM, Avi Kivity wrote:
qcow2 is not a properly designed image format. It was a weekend
hacking session from Fabrice that he dropped in the code base and
never really finished doing what he originally intended. The
improvements that have been made to it are almost at th
On 09/09/2010 08:43 PM, Anthony Liguori wrote:
Hm, we do have a use case for qcow2-over-lvm. I can't say it's
something I like, but a point to consider.
We specifically are not supporting that use-case in QED today.
There's a good reason for it. For cluster allocation, we achieve good
pe
On 09/10/2010 12:01 AM, Christoph Hellwig wrote:
On Thu, Sep 09, 2010 at 09:24:26AM +0300, Avi Kivity wrote:
The other thing we can do is defragment the logical image, then
defragment the underlying file (if the filesystem supports it, issue the
appropriate ioctl, otherwise defragment to a new
On 09/09/2010 03:49 PM, Anthony Liguori wrote:
On 09/09/2010 01:45 AM, Avi Kivity wrote:
Loading very large L2 tables on demand will result in very long
latencies. Increasing cluster size will result in very long first
write latencies. Adding an extra level results in an extra random
write
On Thu, Sep 09, 2010 at 09:24:26AM +0300, Avi Kivity wrote:
> The other thing we can do is defragment the logical image, then
> defragment the underlying file (if the filesystem supports it, issue the
> appropriate ioctl, otherwise defragment to a new file which you write
> linearly).
What's wh
On Thu, Sep 09, 2010 at 12:43:28PM -0500, Anthony Liguori wrote:
> Define "very large disks".
>
> My target for VM images is 100GB-1TB. Practically speaking, that at
> least covers us for the next 5 years.
We have 2TB SATA disks shipping already, and people tend to produce
more and more "data".
On 09/09/2010 01:59 AM, Avi Kivity wrote:
On 09/08/2010 06:07 PM, Stefan Hajnoczi wrote:
uint32_t table_size; /* table size, in clusters */
Presumably L1 table size? Or any table size?
Hm. It would be nicer not to require contiguous sectors anywhere. How
about a variable- or
On 09/09/2010 01:45 AM, Avi Kivity wrote:
Loading very large L2 tables on demand will result in very long
latencies. Increasing cluster size will result in very long first
write latencies. Adding an extra level results in an extra random
write every 4TB.
It would be trivially easy to add an
On 09/08/2010 06:07 PM, Stefan Hajnoczi wrote:
uint32_t table_size; /* table size, in clusters */
Presumably L1 table size? Or any table size?
Hm. It would be nicer not to require contiguous sectors anywhere. How
about a variable- or fixed-height tree?
Both extents and fancie
On 09/08/2010 02:15 PM, Stefan Hajnoczi wrote:
3. Metadata update reaches disk but data does not. The interesting
case! The L2 table now points to a cluster which is beyond the last
cluster in the image file. Remember that file size is rounded down by
cluster size, so partial data writes are
On 09/09/2010 09:45 AM, Avi Kivity wrote:
A new format doesn't introduce much additional complexity. We
provide image conversion tool and we can almost certainly provide an
in-place conversion tool that makes the process very fast.
It requires users to make a decision. By the time qed is
On 09/08/2010 03:48 PM, Anthony Liguori wrote:
On 09/08/2010 03:23 AM, Avi Kivity wrote:
On 09/08/2010 01:27 AM, Anthony Liguori wrote:
FWIW, L2s are 256K at the moment and with a two level table, it can
support 5PB of data.
I clearly suck at basic math today. The image supports 64TB toda
On 09/09/2010 05:35 AM, Christoph Hellwig wrote:
On Wed, Sep 08, 2010 at 03:28:50PM -0500, Anthony Liguori wrote:
That's a good point. Is there a reasonable way to do this cooperatively
with the underlying filesystem?
The only thing we can do easily is to try to use as large as possible
exten
On 09/08/2010 03:55 PM, Anthony Liguori wrote:
(3 levels)
Dunno, just seems more regular to me. Image resize doesn't need to
relocate the L2 table in case it overflows.
The overhead from three levels is an extra table, which is negligible.
It means an extra I/O request in the degenerate
On Wed, Sep 08, 2010 at 03:28:50PM -0500, Anthony Liguori wrote:
> That's a good point. Is there a reasonable way to do this cooperatively
> with the underlying filesystem?
The only thing we can do easily is to try to use as large as possible
extents in the allocation. Once we're at a cuple Meg
On 09/08/2010 03:23 PM, Christoph Hellwig wrote:
On Wed, Sep 08, 2010 at 11:30:10AM -0500, Anthony Liguori wrote:
http://wiki.qemu.org/Features/QED/OnlineDefrag
Is a spec for a very simple approach to online defrag that I hope we can
implement in the near future. I think that once we have
On Wed, Sep 08, 2010 at 11:30:10AM -0500, Anthony Liguori wrote:
> http://wiki.qemu.org/Features/QED/OnlineDefrag
>
> Is a spec for a very simple approach to online defrag that I hope we can
> implement in the near future. I think that once we have the mechanisms
> to freeze clusters and to swa
On 09/08/2010 01:56 PM, Blue Swirl wrote:
That's a bit big, for example CD images are only 640M and there were
smaller disks. But I guess you mean the smallest maximum size limited
by the cluster_size etc, so the actual images may be even smaller.
Yes. The smallest image is one cluster. T
On Wed, Sep 8, 2010 at 6:35 PM, Anthony Liguori wrote:
> On 09/08/2010 01:24 PM, Blue Swirl wrote:
>>
>> Based on these:
>> #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t))
>> header.image_size<= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size,
>> the maximum image size
On 09/08/2010 01:24 PM, Blue Swirl wrote:
Based on these:
#define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t))
header.image_size<= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size,
the maximum image size equals to table_size^2 * cluster_size^3 /
sizeof(uint64_t)^2. Is the sq
On Wed, Sep 8, 2010 at 3:37 PM, Stefan Hajnoczi wrote:
> On Tue, Sep 7, 2010 at 8:25 PM, Blue Swirl wrote:
>> On Mon, Sep 6, 2010 at 10:04 AM, Stefan Hajnoczi
>> wrote:
>>> QEMU Enhanced Disk format is a disk image format that forgoes features
>>> found in qcow2 in favor of better levels of perf
On 09/08/2010 10:38 AM, Christoph Hellwig wrote:
On Wed, Sep 08, 2010 at 12:15:13PM +0100, Stefan Hajnoczi wrote:
In-place writes overwrite old data in the image file. They do not
allocate new clusters or update any metadata. This is why write
performance is comparable to raw in the long r
On Wed, Sep 08, 2010 at 12:15:13PM +0100, Stefan Hajnoczi wrote:
> In-place writes overwrite old data in the image file. They do not
> allocate new clusters or update any metadata. This is why write
> performance is comparable to raw in the long run.
Only if qed doesn't cause additional fragment
On Tue, Sep 7, 2010 at 8:25 PM, Blue Swirl wrote:
> On Mon, Sep 6, 2010 at 10:04 AM, Stefan Hajnoczi
> wrote:
>> QEMU Enhanced Disk format is a disk image format that forgoes features
>> found in qcow2 in favor of better levels of performance and data
>> integrity. Due to its simpler on-disk lay
On Tue, Sep 7, 2010 at 3:51 PM, Avi Kivity wrote:
> On 09/06/2010 04:06 PM, Anthony Liguori wrote:
>>
>> Another point worth mentioning is that our intention is to have a formal
>> specification of the format before merging. A start of that is located at
>> http://wiki.qemu.org/Features/QED
>>
>
Am 08.09.2010 15:26, schrieb Anthony Liguori:
> On 09/08/2010 08:20 AM, Kevin Wolf wrote:
>> Am 08.09.2010 14:48, schrieb Anthony Liguori:
> I think one of the critical flaws in qcow2 was trying to invent a
> better filesystem within qemu instead of just sticking to a very
> simple and
On 09/08/2010 08:20 AM, Kevin Wolf wrote:
Am 08.09.2010 14:48, schrieb Anthony Liguori:
On 09/08/2010 03:23 AM, Avi Kivity wrote:
On 09/08/2010 01:27 AM, Anthony Liguori wrote:
FWIW, L2s are 256K at the moment and with a two level table, it can
support 5PB of data.
Am 08.09.2010 14:48, schrieb Anthony Liguori:
> On 09/08/2010 03:23 AM, Avi Kivity wrote:
>> On 09/08/2010 01:27 AM, Anthony Liguori wrote:
FWIW, L2s are 256K at the moment and with a two level table, it can
support 5PB of data.
>>>
>>>
>>> I clearly suck at basic math today. The image
On 09/08/2010 03:53 AM, Avi Kivity wrote:
On 09/08/2010 11:41 AM, Alexander Graf wrote:
On 08.09.2010, at 10:23, Avi Kivity wrote:
On 09/08/2010 01:27 AM, Anthony Liguori wrote:
FWIW, L2s are 256K at the moment and with a two level table, it
can support 5PB of data.
I clearly suck at basic
On 09/08/2010 03:23 AM, Avi Kivity wrote:
On 09/08/2010 01:27 AM, Anthony Liguori wrote:
FWIW, L2s are 256K at the moment and with a two level table, it can
support 5PB of data.
I clearly suck at basic math today. The image supports 64TB today.
Dropping to 128K tables would reduce it to 1
Here is a summary of how qed images can be accessed safely after a
crash or power loss.
First off, we only need to consider write operations since read
operations do not change the state of the image file and cannot lead
to metadata corruption.
There are two types of writes. Allocating writes wh
On 09/08/2010 11:41 AM, Alexander Graf wrote:
On 08.09.2010, at 10:23, Avi Kivity wrote:
On 09/08/2010 01:27 AM, Anthony Liguori wrote:
FWIW, L2s are 256K at the moment and with a two level table, it can support 5PB
of data.
I clearly suck at basic math today. The image supports 64TB toda
On 08.09.2010, at 10:23, Avi Kivity wrote:
> On 09/08/2010 01:27 AM, Anthony Liguori wrote:
>>> FWIW, L2s are 256K at the moment and with a two level table, it can support
>>> 5PB of data.
>>
>>
>> I clearly suck at basic math today. The image supports 64TB today.
>> Dropping to 128K tables
On 09/08/2010 01:27 AM, Anthony Liguori wrote:
FWIW, L2s are 256K at the moment and with a two level table, it can
support 5PB of data.
I clearly suck at basic math today. The image supports 64TB today.
Dropping to 128K tables would reduce it to 16TB and 64k tables would
be 4TB.
Maybe w
Am 07.09.2010 22:41, schrieb Anthony Liguori:
> There's two types of snapshots that I think can cause confusion.
> There's CPU/device state snapshots and then there's a block device snapshot.
>
> qcow2 and qed both support block device snapshots. qed only supports
> external snapshots (via bac
On Tue, Sep 07, 2010 at 05:29:53PM -0500, Anthony Liguori wrote:
> If it were just one bit for just raw or not raw, wouldn't that be enough?
>
> Everything that isn't raw can be probed reliably so we really only need
> to distinguish between things that are probe-able and things that are
> not p
On 09/07/2010 04:35 PM, Christoph Hellwig wrote:
On Tue, Sep 07, 2010 at 11:12:15AM -0500, Anthony Liguori wrote:
IOW, what are valid values for backing_fmt? "raw" and "qed" are obvious
but what does it mean from a formal specification perspective to have
"vmdk"? Is that VMDK v3 or v4, wha
On 09/07/2010 11:25 AM, Anthony Liguori wrote:
On 09/07/2010 11:09 AM, Avi Kivity wrote:
On 09/07/2010 06:40 PM, Anthony Liguori wrote:
Need a checksum for the header.
Is that not a bit overkill for what we're doing? What's the benefit?
Make sure we're not looking at a header write inter
On Tue, Sep 07, 2010 at 11:12:15AM -0500, Anthony Liguori wrote:
> IOW, what are valid values for backing_fmt? "raw" and "qed" are obvious
> but what does it mean from a formal specification perspective to have
> "vmdk"? Is that VMDK v3 or v4, what if there's a v5?
It might be better to just u
On 09/07/2010 02:25 PM, Blue Swirl wrote:
On Mon, Sep 6, 2010 at 10:04 AM, Stefan Hajnoczi
wrote:
QEMU Enhanced Disk format is a disk image format that forgoes features
found in qcow2 in favor of better levels of performance and data
integrity. Due to its simpler on-disk layout, it is pos
1 - 100 of 117 matches
Mail list logo