I am still seeing the performance degradation, but I did find something interesting (and promising) with qemu 5.1.50. Enabling the subcluster allocation support in qemu 5.1.50 (extended_l2=on) eliminates the performance degradation of adding an overlay. Without subcluster allocation enabled, I still see the performance degradation in qemu 5.1.50 when adding an overlay. For these experiments, I used 64K blocks and 2M qcow2 cluster size.
On Mon, Oct 19, 2020 at 12:51 PM Alberto Garcia <[email protected]> wrote: > On Thu 27 Aug 2020 06:29:15 PM CEST, Yoonho Park wrote: > > Below is the data with the cache disabled ("virsh attach-disk ... --cache > > none"). I added the previous data for reference. Overall, random read > > performance was not affected significantly. This makes sense because a > > cache is probably not going to help random read performance much. BTW how > > big the cache is by default? Random write performance for 4K blocks seems > > more "sane" now. Random write performance for 64K blocks is interesting > > because base image (0 overlay) performance is 2X slower than 1-5 > overlays. > > We believe this is because the random writes to an overlay actually turn > > into sequential writes (appends to the overlay). Does this make sense? > > > > > > NO CACHE > > > > 4K blocks 64K blocks > > > > olays rd bw rd iops wr bw wr iops rd bw rd iops wr bw wr iops > > > > 0 4478 1119 4684 1171 57001 890 42050 657 > > > > 1 4490 1122 2503 625 56656 885 93483 1460 > > I haven't been able to reproduce this (I tried the scenarios with 0 and > 1 overlays), did you figure out anything new or what's the situation? > > Berto >
