Thanks. Dbench does not logically allocate new disk space all the time, because it's a FS level benchmark that creates file and deletes them. Therefore it also depends on the guest FS, say, a btrfs guest FS allocates about 1.8x space of that from EXT4, due to its COW nature. It does cause the FS to allocate some space during about 1/3 of the test duration I think. But this does not mitigate it too much because a FS often writes in a stride rather than consecutively, which causes write amplification at allocation times.
So I tested it with qemu-img convert from a 400M raw file: zheq-PC sdb # time ~/qemu-sync-test/bin/qemu-img convert -f raw -t unsafe -O vdi /run/shm/rand 1.vdi real 0m0.402s user 0m0.206s sys 0m0.202s zheq-PC sdb # time ~/qemu-sync-test/bin/qemu-img convert -f raw -t writeback -O vdi /run/shm/rand 1.vdi real 0m8.678s user 0m0.169s sys 0m0.500s zheq-PC sdb # time qemu-img convert -f raw -t writeback -O vdi /run/shm/rand 1.vdi real 0m4.320s user 0m0.148s sys 0m0.471s zheq-PC sdb # time qemu-img convert -f raw -t unsafe -O vdi /run/shm/rand 1.vdi real 0m0.489s user 0m0.173s sys 0m0.325s zheq-PC sdb # time qemu-img convert -f raw -O vdi /run/shm/rand 1.vdi real 0m0.515s user 0m0.168s sys 0m0.357s zheq-PC sdb # time ~/qemu-sync-test/bin/qemu-img convert -f raw -O vdi /run/shm/rand 1.vdi real 0m0.431s user 0m0.192s sys 0m0.248s Although 400M is not a giant file, it does show the trend. As you can see when there's drastic allocation needs, and when there no extra buffering from a virtualized host, the throughput drops about 50%. But still it has no effect on "unsafe" mode, as predicted. Also I believe that expecting to use a half-converted image is seldom the use case, while host crash and power loss are not so unimaginable. Looks like qemu-img convert is using "unsafe" as default as well, so even novice "qemu-img convert" users are not likely to find performance degradation. I have not yet tried guest OS installation on top, but I guess a new flag for one-time faster OS installation is not likely useful, and "cache=unsafe" already does the trick. On Sat, May 9, 2015 at 5:26 AM Stefan Weil <s...@weilnetz.de> wrote: > Am 08.05.2015 um 15:55 schrieb Kevin Wolf: > > Am 08.05.2015 um 15:14 hat Max Reitz geschrieben: > >> On 07.05.2015 17:16, Zhe Qiu wrote: > >>> In reference to b0ad5a45...078a458e, metadata writes to > >>> qcow2/cow/qcow/vpc/vmdk are all synced prior to succeeding writes. > >>> > >>> Only when write is successful that bdrv_flush is called. > >>> > >>> Signed-off-by: Zhe Qiu <phoea...@gmail.com> > >>> --- > >>> block/vdi.c | 3 +++ > >>> 1 file changed, 3 insertions(+) > >> I missed Kevin's arguments before, but I think that adding this is > >> more correct than not having it; and when thinking about speed, this > >> is vdi, a format supported for compatibility. > > If you use it only as a convert target, you probably care more about > > speed than about leaks in case of a host crash. > > > >> So if we wanted to optimize it, we'd probably have to cache multiple > >> allocations, do them at once and then flush afterwards (like the > >> metadata cache we have in qcow2?) > > That would defeat the purpose of this patch which aims at having > > metadata and data written out almost at the same time. On the other > > hand, fully avoiding the problem instead of just making the window > > smaller would require a journal, which VDI just doesn't have. > > > > I'm not convinced of this patch, but I'll defer to Stefan Weil as the > > VDI maintainer. > > > > Kevin > > Thanks for asking. I share your concerns regarding reduced performance > caused by bdrv_flush. Conversions to VDI will take longer (how much?), > and also installation of an OS on a new VDI disk image will be slower > because that are the typical scenarios where the disk usage grows. > > @phoeagon: Did the benchmark which you used allocate additional disk > storage? If not or if it only allocated once and then spent some time > on already allocated blocks, that benchmark was not valid for this case. > > On the other hand I don't see a need for the flushing because the kind > of failures (power failure) and their consequences seem to be acceptable > for typical VDI usage, namely either image conversion or tests with > existing images. > > That's why I'd prefer not to use bdrv_flush here. Could we make > bdrv_flush optional (either generally or for cases like this one) so > both people who prefer speed and people who would want > bdrv_flush to decrease the likelihood of inconsistencies can be > satisfied? > > Stefan > >