On Thu, Jan 21, 2021 at 3:29 PM Peter Lieven <p...@kamp.de> wrote: > > Am 21.01.21 um 20:42 schrieb Jason Dillaman: > > On Wed, Jan 20, 2021 at 6:01 PM Peter Lieven <p...@kamp.de> wrote: > >> > >>> Am 19.01.2021 um 15:20 schrieb Jason Dillaman <jdill...@redhat.com>: > >>> > >>> On Tue, Jan 19, 2021 at 4:36 AM Peter Lieven <p...@kamp.de> wrote: > >>>>> Am 18.01.21 um 23:33 schrieb Jason Dillaman: > >>>>> On Fri, Jan 15, 2021 at 10:39 AM Peter Lieven <p...@kamp.de> wrote: > >>>>>> Am 15.01.21 um 16:27 schrieb Jason Dillaman: > >>>>>>> On Thu, Jan 14, 2021 at 2:59 PM Peter Lieven <p...@kamp.de> wrote: > >>>>>>>> Am 14.01.21 um 20:19 schrieb Jason Dillaman: > >>>>>>>>> On Sun, Dec 27, 2020 at 11:42 AM Peter Lieven <p...@kamp.de> wrote: > >>>>>>>>>> since we implement byte interfaces and librbd supports aio on byte > >>>>>>>>>> granularity we can lift > >>>>>>>>>> the 512 byte alignment. > >>>>>>>>>> Signed-off-by: Peter Lieven <p...@kamp.de> > >>>>>>>>>> --- > >>>>>>>>>> block/rbd.c | 2 -- > >>>>>>>>>> 1 file changed, 2 deletions(-) > >>>>>>>>>> diff --git a/block/rbd.c b/block/rbd.c > >>>>>>>>>> index 27b4404adf..8673e8f553 100644 > >>>>>>>>>> --- a/block/rbd.c > >>>>>>>>>> +++ b/block/rbd.c > >>>>>>>>>> @@ -223,8 +223,6 @@ done: > >>>>>>>>>> static void qemu_rbd_refresh_limits(BlockDriverState *bs, Error > >>>>>>>>>> **errp) > >>>>>>>>>> { > >>>>>>>>>> BDRVRBDState *s = bs->opaque; > >>>>>>>>>> - /* XXX Does RBD support AIO on less than 512-byte alignment? > >>>>>>>>>> */ > >>>>>>>>>> - bs->bl.request_alignment = 512; > >>>>>>>>> Just a suggestion, but perhaps improve discard alignment, max > >>>>>>>>> discard, > >>>>>>>>> optimal alignment (if that's something QEMU handles internally) if > >>>>>>>>> not > >>>>>>>>> overridden by the user. > >>>>>>>> Qemu supports max_discard and discard_alignment. Is there a call to > >>>>>>>> get these limits > >>>>>>>> from librbd? > >>>>>>>> What do you mean by optimal_alignment? The object size? > >>>>>>> krbd does a good job of initializing defaults [1] where optimal and > >>>>>>> discard alignment is 64KiB (can actually be 4KiB now), max IO size for > >>>>>>> writes, discards, and write-zeroes is the object size * the stripe > >>>>>>> count. > >>>>>> Okay, I will have a look at it. If qemu issues a write, discard, > >>>>>> write_zero greater than > >>>>>> obj_size * stripe count will librbd split it internally or will the > >>>>>> request fail? > >>>>> librbd will handle it as needed. My goal is really just to get the > >>>>> hints down the guest OS. > >>>>>> Regarding the alignment it seems that rbd_dev->opts->alloc_size is > >>>>>> something that comes from the device > >>>>>> configuration and not from rbd? I don't have that information inside > >>>>>> the Qemu RBD driver. > >>>>> librbd doesn't really have the information either. The 64KiB guess > >>>>> that krbd uses was a compromise since that was the default OSD > >>>>> allocation size for HDDs since Luminous. Starting with Pacific that > >>>>> default is going down to 4KiB. > >>>> I will try to adjust these values as far as it is possible and makes > >>>> sense. > >>>> Is there a way to check the minimum supported OSD release in the backend > >>>> from librbd / librados? > >>> It's not a minimum -- RADOS will gladly access 1 byte writes as well. > >>> It's really just the optimal (performance and space-wise). Sadly, > >>> there is no realistic way to query this data from the backend. > >> So you would suggest to advertise an optimal transfer length of 64k and > >> max transfer length of obj size * stripe count to the guest unless we have > >> an API in the future to query these limits from the backend? > > I'll open a Ceph tracker ticket to expose these via the API in a future > > release. > > > That would be good to have! > > > > > >> I would leave request alignment at 1 byte as otherwise Qemu will issue > >> RMWs for all write requests that do not align. Everything that comes from > >> a guest OS is very likely 4k aligned anyway. > > My goal is definitely not to have QEMU do any extra work for splitting > > or aligning IOs. I am really only trying to get hints passed down the > > guest via the virtio drivers. If there isn't the plumbing in QEMU for > > that yet, disregard. > > > From what I read from the code Qemu emulates the Block Limits VPD Page for > virtio-scsi, but the limits there are not taken from the backend driver. They > can be passed as config to the virtio-scsi device. > > However, Qemu uses all the Block Limit that can be found in > include/block/block_int.h in the BlockLimits struct to generate optimal > requests if it comes to block operations like mirroring, or zeroing of a whole > > device etc. Some of the alignments (e.g. the request alignment) have direct > influence and make Qemu split requests from the guest or even make RMW cycles. > > If my understanding is incorrect please anyone correct me. > > > It would indeed be nice to have an option to propagate the settings from the > backend driver into the Guest. But from my understanding this is not there > yet. > > > So I would leave it as it. Drop the request_alignment = 512 (like in the > patch) and just advertise the cluster_size at obj_size. This is already in > the rbd driver today. > > The cluster_size e.g. is used in any qemu-img convert operation to align read > / write requests and size them apropiately.
+1 to leave as-is until we can pass those hints down. > Peter > > Thanks! -- Jason