On 4/13/20 2:12 PM, Denis Plotnikov wrote: > Problem description: qcow2 internal snapshot saving time is too big on HDD ~ > 25 sec > > When a qcow2 image is placed on a regular HDD and the image is openned with > O_DIRECT the snapshot saving time is around 26 sec. > The snapshot saving time can be 4 times sorter. > The patch series propose the way to achive that. > > Why is the saving time = ~25 sec? > > There are three things: > 1. qemu-file iov limit (currently 64) > 2. direct qemu_fflush calls, inducing disk writings in a non-aligned way, which results further in READ-MODIFY-WRITE operations at the beginning and at the end of the writing data. Within synchronous operations this slow-downs the process a lot!
> 3. ram data copying and synchronous disk wrtings > > When 1, 2 are quite clear, the 3rd needs some explaination: > > Internal snapshot uses qemu-file as an interface to store the data with > stream semantics. > qemu-file avoids data coping when possible (mostly for ram data) > and use iovectors to propagate the data to an undelying block driver state. > In the case when qcow2 openned with O_DIRECT it is suboptimal. > > This is what happens: on writing, when the iovectors query goes from qemu-file > to bdrv (here and further by brdv I mean qcow2 with posix O_DIRECT openned > backend), > the brdv checks all iovectors to be base and size aligned, if it's not the > case, > the data copied to an internal buffer and synchronous pwrite is called. > If the iovectors are aligned, io_submit is called. > > In our case, snapshot almost always induces pwrite, since we never have all > the iovectors aligned in the query, because of frequent adding a short > iovector: > 8 byte ram-page delimiters, after adding each ram page iovector. > > So the qemu-file code in this case: > 1. doesn't aviod ram copying > 2. works fully synchronously > > How to improve the snapshot time: > > 1. easy way: to increase iov limit to IOV_MAX (1024). > This will reduce synchronous writing frequency. > My test revealed that with iov limit = IOV_MAX the snapshot time *~12 sec*. > > 2. complex way: do writings asynchronously. > Introduce both base- and size-aligned buffer, write the data only when > the buffer is full, write the buffer asynchronously, meanwhile filling another > buffer with snapshot data. > My test revealed that this complex way provides the snapshot time *~6 sec*, > 2 times better than just iov limit increasing. We also align written data as flush operations over the disk are not mandatory. > The patch proposes how to improve the snapshot performance in the complex way, > allowing to use the asyncronous writings when needed. > > This is an RFC series, as I didn't confident that I fully understand all > qemu-file use cases. I tried to make the series in a safe way to not break > anything related to qemu-file using in other places, like migration. > > All comments are *VERY* appriciated! > > Thanks, > Denis > > Denis Plotnikov (3): > qemu-file: introduce current buffer > qemu-file: add buffered mode > migration/savevm: use qemu-file buffered mode for non-cached bdrv > > include/qemu/typedefs.h | 2 + > migration/qemu-file.c | 479 > +++++++++++++++++++++++++++++++++++++++++------- > migration/qemu-file.h | 9 + > migration/savevm.c | 38 +++- > 4 files changed, 456 insertions(+), 72 deletions(-) >