O vecs exceeding IOV_MAX

Hanna Czenczek Tue, 06 Jun 2023 01:46:48 -0700

On 06.06.23 10:00, Michael Tokarev wrote:

05.06.2023 18:45, Hanna Czenczek wrote:

When processing vectored guest requests that are not aligned to the
storage request alignment, we pad them by adding head and/or tail
buffers for a read-modify-write cycle.


The guest can submit I/O vectors up to IOV_MAX (1024) in length, but
with this padding, the vector can exceed that limit.  As of
4c002cef0e9abe7135d7916c51abce47f7fc1ee2 ("util/iov: make
qemu_iovec_init_extended() honest"), we refuse to pad vectors beyond the
limit, instead returning an error to the guest.

To the guest, this appears as a random I/O error.  We should not return
an I/O error to the guest when it issued a perfectly valid request.

Before 4c002cef0e9abe7135d7916c51abce47f7fc1ee2, we just made the vector
longer than IOV_MAX, which generally seems to work (because the guest
assumes a smaller alignment than we really have, file-posix's
raw_co_prw() will generally see bdrv_qiov_is_aligned() return false, and
so emulate the request, so that the IOV_MAX does not matter). However,
that does not seem exactly great.


Why it is not "exactly great"?  To me, it seems to be the best solution.
I'd say we should be able to tolerate "slight" increase over IOV_MAX for
"internal purposes", so to say.  We can limit guest-supplied vector to
IOV_MAX, but internally guard against integer overflow only.


That’s a good point that may have been worth investigating.

I considered it not great because I assumed that that4c002cef0e9abe7135d7916c51abce47f7fc1ee2 was written with intent, i.e.that the IOV_MAX limit was put in because we just generally assume inthe block layer that it isn’t exceeded. It may have worked out finebefore for one specific protocol driver (file-posix) most of thetime[1], but I think it’s reasonable to assume that code in the blocklayer has generally been written under the assumption that vectors willnot exceed the IOV_MAX limit (or otherwise we wouldn’t use that constantin the block layer). So in addition to file-posix, we’d also need toinspect other code (like the blkio driver that will submit vectoredrequests to an external library) what the implications of exceeding thatlimit are in all places.

That is not to say that it might not have been the simpler solution toagree to exceed the limit internally, but it is to say that the fullimplications would need to be investigated first. In contrast, thesolution added here is more complicated in code, but is localized.

[1] It won’t be fine if all buffers are 4k in size and 4k-aligned, whichI admit is unlikely for a request whose offset isn’t aligned, but whichcould happen with a partition that isn’t aligned to 4k.

I see two ways to fix this problem:
1. We split such long requests into two requests.
2. We join some elements of the vector into new buffers to make it
    shorter.


This seems to be over-complicated, both of them, no?

I would have preferred to have this discussion while the patch was stillon-list for review (this specific version was for two months, countingfrom the first version was three). Do you think it is so complicatedand thus bug-prone that we must revert this series now and try the otherroute?

I can agree that perhaps the other route could have been simpler, butnow we already have patches that are reviewed and in master, which solvethe problem. So I won’t spend more time on tackling this issue fromanother angle. If you are happy to do so, patches are always welcome.


Hanna

Re: [PULL 02/17] block: Collapse padded I/O vecs exceeding IOV_MAX

Reply via email to