On Thu, 15 Feb 2024, Richard Henderson wrote:
> > Converting a 4.4 GiB Windows 10 image to qcow2. It was mentioned in v1 and > > v2, > > are you saying they did not reach your inbox? > > https://lore.kernel.org/qemu-devel/20231013155856.21475-1-mmroma...@ispras.ru/ > > https://lore.kernel.org/qemu-devel/20231027143704.7060-1-mmroma...@ispras.ru/ > > I'm saying that this is not a reproducible description of methodology. > > With master, so with neither of our changes: > > I tried converting an 80G win7 image that I happened to have lying about, I > see buffer_zero_avx2 with only 3.03% perf overhead. Then I tried truncating > the image to 16G to see if having the entire image in ram would help -- not > yet, still only 3.4% perf overhead. Finally, I truncated the image to 4G and > saw 2.9% overhead. > > So... help be out here. I would like to be able to see results that are at > least vaguely similar. Ah, I guess you might be running at low perf_event_paranoid setting that allows unprivileged sampling of kernel events? In our submissions the percentage was for perf_event_paranoid=2, i.e. relative to Qemu only, excluding kernel time under syscalls. Retrieve IE11.Win7.VirtualBox.zip from https://archive.org/details/ie11.win7.virtualbox and use unzip -p IE11.Win7.VirtualBox.zip | tar xv to extract 'IE11 - Win7-disk001.vmdk'. (Mikhail used a different image when preparing the patch) On this image, I get 70% in buffer_zero_sse2 on a Sandy Bridge running qemu-img convert 'IE11 - Win7-disk001.vmdk' -O qcow2 /tmp/t.qcow2 user:kernel time is about 0.15:2.3, so 70% relative to user time does roughly correspond to single-digits percentage relative to (user+kernel) time. (which does tell us that qemu-img is doing I/O inefficiently, it shouldn't need two seconds to read a fully cached 5 Gigabyte file) Alexander