rbd: change request alignment to 1 byte

Jason Dillaman Thu, 21 Jan 2021 12:55:52 -0800

On Thu, Jan 21, 2021 at 3:29 PM Peter Lieven <p...@kamp.de> wrote:
>
> Am 21.01.21 um 20:42 schrieb Jason Dillaman:
> > On Wed, Jan 20, 2021 at 6:01 PM Peter Lieven <p...@kamp.de> wrote:
> >>
> >>> Am 19.01.2021 um 15:20 schrieb Jason Dillaman <jdill...@redhat.com>:
> >>>
> >>> On Tue, Jan 19, 2021 at 4:36 AM Peter Lieven <p...@kamp.de> wrote:
> >>>>> Am 18.01.21 um 23:33 schrieb Jason Dillaman:
> >>>>> On Fri, Jan 15, 2021 at 10:39 AM Peter Lieven <p...@kamp.de> wrote:
> >>>>>> Am 15.01.21 um 16:27 schrieb Jason Dillaman:
> >>>>>>> On Thu, Jan 14, 2021 at 2:59 PM Peter Lieven <p...@kamp.de> wrote:
> >>>>>>>> Am 14.01.21 um 20:19 schrieb Jason Dillaman:
> >>>>>>>>> On Sun, Dec 27, 2020 at 11:42 AM Peter Lieven <p...@kamp.de> wrote:
> >>>>>>>>>> since we implement byte interfaces and librbd supports aio on byte 
> >>>>>>>>>> granularity we can lift
> >>>>>>>>>> the 512 byte alignment.
> >>>>>>>>>> Signed-off-by: Peter Lieven <p...@kamp.de>
> >>>>>>>>>> ---
> >>>>>>>>>> block/rbd.c | 2 --
> >>>>>>>>>> 1 file changed, 2 deletions(-)
> >>>>>>>>>> diff --git a/block/rbd.c b/block/rbd.c
> >>>>>>>>>> index 27b4404adf..8673e8f553 100644
> >>>>>>>>>> --- a/block/rbd.c
> >>>>>>>>>> +++ b/block/rbd.c
> >>>>>>>>>> @@ -223,8 +223,6 @@ done:
> >>>>>>>>>> static void qemu_rbd_refresh_limits(BlockDriverState *bs, Error 
> >>>>>>>>>> **errp)
> >>>>>>>>>> {
> >>>>>>>>>>    BDRVRBDState *s = bs->opaque;
> >>>>>>>>>> -    /* XXX Does RBD support AIO on less than 512-byte alignment? 
> >>>>>>>>>> */
> >>>>>>>>>> -    bs->bl.request_alignment = 512;
> >>>>>>>>> Just a suggestion, but perhaps improve discard alignment, max 
> >>>>>>>>> discard,
> >>>>>>>>> optimal alignment (if that's something QEMU handles internally) if 
> >>>>>>>>> not
> >>>>>>>>> overridden by the user.
> >>>>>>>> Qemu supports max_discard and discard_alignment. Is there a call to 
> >>>>>>>> get these limits
> >>>>>>>> from librbd?
> >>>>>>>> What do you mean by optimal_alignment? The object size?
> >>>>>>> krbd does a good job of initializing defaults [1] where optimal and
> >>>>>>> discard alignment is 64KiB (can actually be 4KiB now), max IO size for
> >>>>>>> writes, discards, and write-zeroes is the object size * the stripe
> >>>>>>> count.
> >>>>>> Okay, I will have a look at it. If qemu issues a write, discard, 
> >>>>>> write_zero greater than
> >>>>>> obj_size  * stripe count will librbd split it internally or will the 
> >>>>>> request fail?
> >>>>> librbd will handle it as needed. My goal is really just to get the
> >>>>> hints down the guest OS.
> >>>>>> Regarding the alignment it seems that rbd_dev->opts->alloc_size is 
> >>>>>> something that comes from the device
> >>>>>> configuration and not from rbd? I don't have that information inside 
> >>>>>> the Qemu RBD driver.
> >>>>> librbd doesn't really have the information either. The 64KiB guess
> >>>>> that krbd uses was a compromise since that was the default OSD
> >>>>> allocation size for HDDs since Luminous. Starting with Pacific that
> >>>>> default is going down to 4KiB.
> >>>> I will try to adjust these values as far as it is possible and makes 
> >>>> sense.
> >>>> Is there a way to check the minimum supported OSD release in the backend 
> >>>> from librbd / librados?
> >>> It's not a minimum -- RADOS will gladly access 1 byte writes as well.
> >>> It's really just the optimal (performance and space-wise). Sadly,
> >>> there is no realistic way to query this data from the backend.
> >> So you would suggest to advertise an optimal transfer length of 64k and 
> >> max transfer length of obj size * stripe count to the guest unless we have 
> >> an API in the future to query these limits from the backend?
> > I'll open a Ceph tracker ticket to expose these via the API in a future 
> > release.
>
>
> That would be good to have!
>
>
> >
> >> I would leave request alignment at 1 byte as otherwise Qemu will issue 
> >> RMWs for all write requests that do not align. Everything that comes from 
> >> a guest OS is very likely 4k aligned anyway.
> > My goal is definitely not to have QEMU do any extra work for splitting
> > or aligning IOs. I am really only trying to get hints passed down the
> > guest via the virtio drivers. If there isn't the plumbing in QEMU for
> > that yet, disregard.
>
>
> From what I read from the code Qemu emulates the Block Limits VPD Page for 
> virtio-scsi, but the limits there are not taken from the backend driver. They 
> can be passed as config to the virtio-scsi device.
>
> However, Qemu uses all the Block Limit that can be found in 
> include/block/block_int.h in the BlockLimits struct to generate optimal 
> requests if it comes to block operations like mirroring, or zeroing of a whole
>
> device etc. Some of the alignments (e.g. the request alignment) have direct 
> influence and make Qemu split requests from the guest or even make RMW cycles.
>
> If my understanding is incorrect please anyone correct me.
>
>
> It would indeed be nice to have an option to propagate the settings from the 
> backend driver into the Guest. But from my understanding this is not there 
> yet.
>
>
> So I would leave it as it. Drop the request_alignment = 512 (like in the 
> patch) and just advertise the cluster_size at obj_size. This is already in 
> the rbd driver today.
>
> The cluster_size e.g. is used in any qemu-img convert operation to align read 
> / write requests and size them apropiately.


+1 to leave as-is until we can pass those hints down.

> Peter
>
>

Thanks!

-- 
Jason

Re: [PATCH 7/7] block/rbd: change request alignment to 1 byte

Reply via email to