Am 03.03.2021 um 18:40 hat Stefano Garzarella geschrieben: > Hi Jason, > as reported in this BZ [1], when qemu-img creates a QCOW2 image on RBD > writing data is very slow compared to a raw file. > > Comparing raw vs QCOW2 image creation with RBD I found that we use a > different object size, for the raw file I see '4 MiB objects', for QCOW2 I > see '64 KiB objects' as reported on comment 14 [2]. > This should be the main issue of slowness, indeed forcing in the code 4 MiB > object size also for QCOW2 increased the speed a lot. > > Looking better I discovered that for raw files, we call rbd_create() with > obj_order = 0 (if 'cluster_size' options is not defined), so the default > object size is used. > Instead for QCOW2, we use obj_order = 16, since the default 'cluster_size' > defined for QCOW2, is 64 KiB.
Hm, the QemuOpts-based image creation is messy, but why does the rbd driver even see the cluster_size option? The first thing qcow2_co_create_opts() does is splitting the passed QemuOpts into options it will process on the qcow2 layer and options that are passed to the protocol layer. So if you pass a cluster_size option, qcow2 should take it for itself and not pass it to rbd. If it is passed to rbd, I think that's a bug in the qcow2 driver. > Using '-o cluster_size=2M' with qemu-img changed only the qcow2 cluster > size, since in qcow2_co_create_opts() we remove the 'cluster_size' from > QemuOpts calling qemu_opts_to_qdict_filtered(). > For some reason that I have yet to understand, after this deletion, however > remains in QemuOpts the default value of 'cluster_size' for qcow2 (64 KiB), > that it's used in qemu_rbd_co_create_opts() So it seems you came to a similar conclusion. We need to find out where the 64k come from and just fix that so that rbd uses its default. > At this point my doubts are: > Does it make sense to use the same cluster_size as qcow2 as object_size in > RBD? > If we want to keep the 2 options separated, how can it be done? Should we > rename the option in block/rbd.c? My lazy answer is that you could just use QMP blockdev-create, where you create layer by layer separately. What could possibly be done for the QemuOpts is using the dotted syntax like for opening, so you could specify file.cluster_size=... for the protocol layer (or data_file.cluster_size=... for the external data file etc.) Kevin