On Wed, Jun 01, 2016 at 12:07:01PM +0200, Kevin Wolf wrote: > Am 01.06.2016 um 11:12 hat Denis V. Lunev geschrieben: > > qcow2_cache_flush() calls bdrv_flush() unconditionally after writing > > cache entries of a particular cache. This can lead to as many as > > 2 additional fdatasyncs inside bdrv_flush. > > > > We can simply skip all fdatasync calls inside qcow2_co_flush_to_os > > as bdrv_flush for sure will do the job. > > This looked wrong at first because flushes are needed to keep the right > order of writes to the different caches. However, I see that you keep > the flush in qcow2_cache_flush_dependency(), so in the code this is > actually fine. > > Can you make that more explicit in the commit message? > > > This seriously affects the > > performance of database operations inside the guest. > > > > Signed-off-by: Denis V. Lunev <d...@openvz.org> > > CC: Pavel Borzenkov <pborzen...@virtuozzo.com> > > CC: Kevin Wolf <kw...@redhat.com> > > CC: Max Reitz <mre...@redhat.com> > > Do you have performance numbers for master and with your patch applied? > (No performance related patch should come without numbers in its commit > message!) > > What I find interesting is that this seems to help even though > duplicated flushes should actually be really cheap because there is no > new data that could be flushed in the second request. Makes me wonder if > guests send duplicated flushes, too, and whether we should optimise > that. > > Maybe it would also be interesting to measure how things perform if we > removed the flush from qcow2_cache_flush_dependency(). This would be > incorrect code (corruption possible after host crash), but I'd like to > know how much performance we actually lose here. This is performance > that could potentially be gained by using a journal.
Here are the results of the following testcase: sequential write of 8Gb file by 64Kb blocks, on unallocated qcow2 image, with fsync() after each 64 block. Lazy refcounts are disabled, so we have a dependent cache here. Results from SSD machine are as follows (every result is a 10 iterations average): w/o patches: ~420 blocks/sec with Den's patch: ~650 blocks/sec with Den's patch + qcow2_cache_flush_dependency() switched to qcow2_cache_flush_nosync(): ~720 blocks/sec > > Kevin