I am still seeing the performance degradation, but I did find something
interesting (and promising) with qemu 5.1.50. Enabling the subcluster
allocation support in qemu 5.1.50 (extended_l2=on) eliminates the
performance degradation of adding an overlay. Without subcluster allocation
enabled, I still see the performance degradation in qemu 5.1.50 when adding
an overlay. For these experiments, I used 64K blocks and 2M qcow2 cluster
size.

On Mon, Oct 19, 2020 at 12:51 PM Alberto Garcia <[email protected]> wrote:

> On Thu 27 Aug 2020 06:29:15 PM CEST, Yoonho Park wrote:
> > Below is the data with the cache disabled ("virsh attach-disk ... --cache
> > none"). I added the previous data for reference. Overall, random read
> > performance was not affected significantly. This makes sense because a
> > cache is probably not going to help random read performance much. BTW how
> > big the cache is by default? Random write performance for 4K blocks seems
> > more "sane" now. Random write performance for 64K blocks is interesting
> > because base image (0 overlay) performance is 2X slower than 1-5
> overlays.
> > We believe this is because the random writes to an overlay actually turn
> > into sequential writes (appends to the overlay). Does this make sense?
> >
> >
> > NO CACHE
> >
> >       4K blocks                    64K blocks
> >
> > olays rd bw rd iops wr bw  wr iops rd bw rd iops wr bw  wr iops
> >
> > 0     4478  1119    4684   1171    57001 890     42050  657
> >
> > 1     4490  1122    2503   625     56656 885     93483  1460
>
> I haven't been able to reproduce this (I tried the scenarios with 0 and
> 1 overlays), did you figure out anything new or what's the situation?
>
> Berto
>

Reply via email to