On Thu, 29 Feb 2024 11:22:24 +0100
Heinrich Schuchardt <heinrich.schucha...@canonical.com> wrote:

> On 29.02.24 02:11, Peter Xu wrote:
> > On Wed, Feb 28, 2024 at 08:07:47PM +0100, Heinrich Schuchardt wrote:  
> >> On 28.02.24 19:39, Peter Maydell wrote:  
> >>> On Wed, 28 Feb 2024 at 18:28, Heinrich Schuchardt
> >>> <heinrich.schucha...@canonical.com> wrote:  
> >>>>
> >>>> On 28.02.24 16:06, Philippe Mathieu-Daudé wrote:  
> >>>>> Hi Heinrich,
> >>>>>
> >>>>> On 28/2/24 13:59, Heinrich Schuchardt wrote:  
> >>>>>> virtqueue_map_desc() is called with values of sz exceeding that may
> >>>>>> exceed
> >>>>>> TARGET_PAGE_SIZE. sz = 0x2800 has been observed.  
> > 
> > Pure (and can also be stupid) question: why virtqueue_map_desc() would map
> > to !direct mem?  Shouldn't those buffers normally allocated from guest RAM?
> >   
> >>>>>>
> >>>>>> We only support a single bounce buffer. We have to avoid
> >>>>>> virtqueue_map_desc() calling address_space_map() multiple times.
> >>>>>> Otherwise
> >>>>>> we see an error
> >>>>>>
> >>>>>>        qemu: virtio: bogus descriptor or out of resources
> >>>>>>
> >>>>>> Increase the minimum size of the bounce buffer to 0x10000 which matches
> >>>>>> the largest value of TARGET_PAGE_SIZE for all architectures.
> >>>>>>
> >>>>>> Signed-off-by: Heinrich Schuchardt <heinrich.schucha...@canonical.com>
> >>>>>> ---
> >>>>>> v2:
> >>>>>>       remove unrelated change
> >>>>>> ---
> >>>>>>     system/physmem.c | 8 ++++++--
> >>>>>>     1 file changed, 6 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/system/physmem.c b/system/physmem.c
> >>>>>> index e3ebc19eef..3c82da1c86 100644
> >>>>>> --- a/system/physmem.c
> >>>>>> +++ b/system/physmem.c
> >>>>>> @@ -3151,8 +3151,12 @@ void *address_space_map(AddressSpace *as,
> >>>>>>                 *plen = 0;
> >>>>>>                 return NULL;
> >>>>>>             }
> >>>>>> -        /* Avoid unbounded allocations */
> >>>>>> -        l = MIN(l, TARGET_PAGE_SIZE);
> >>>>>> +        /*
> >>>>>> +         * There is only one bounce buffer. The largest occuring
> >>>>>> value of
> >>>>>> +         * parameter sz of virtqueue_map_desc() must fit into the 
> >>>>>> bounce
> >>>>>> +         * buffer.
> >>>>>> +         */
> >>>>>> +        l = MIN(l, 0x10000);  
> >>>>>
> >>>>> Please define this magic value. Maybe ANY_TARGET_PAGE_SIZE or
> >>>>> TARGETS_BIGGEST_PAGE_SIZE?
> >>>>>
> >>>>> Then along:
> >>>>>      QEMU_BUILD_BUG_ON(TARGET_PAGE_SIZE <= TARGETS_BIGGEST_PAGE_SIZE);  
> >>>>
> >>>> Thank you Philippe for reviewing.
> >>>>
> >>>> TARGETS_BIGGEST_PAGE_SIZE does not fit as the value is not driven by the
> >>>> page size.
> >>>> How about MIN_BOUNCE_BUFFER_SIZE?
> >>>> Is include/exec/memory.h the right include for the constant?
> >>>>
> >>>> I don't think that TARGET_PAGE_SIZE has any relevance for setting the
> >>>> bounce buffer size. I only mentioned it to say that we are not
> >>>> decreasing the value on any existing architecture.
> >>>>
> >>>> I don't know why TARGET_PAGE_SIZE ever got into this piece of code.
> >>>> e3127ae0cdcd ("exec: reorganize address_space_map") does not provide a
> >>>> reason for this choice. Maybe Paolo remembers.  
> >>>
> >>> The limitation to a page dates back to commit 6d16c2f88f2a in 2009,
> >>> which was the first implementation of this function. I don't think
> >>> there's a particular reason for that value beyond that it was
> >>> probably a convenient value that was assumed to be likely "big enough".
> >>>
> >>> I think the idea with this bounce-buffer has always been that this
> >>> isn't really a code path we expected to end up in very often --
> >>> it's supposed to be for when devices are doing DMA, which they
> >>> will typically be doing to memory (backed by host RAM), not
> >>> devices (backed by MMIO and needing a bounce buffer). So the
> >>> whole mechanism is a bit "last fallback to stop things breaking
> >>> entirely".
> >>>
> >>> The address_space_map() API says that it's allowed to return
> >>> a subset of the range you ask for, so if the virtio code doesn't
> >>> cope with the minimum being set to TARGET_PAGE_SIZE then either
> >>> we need to fix that virtio code or we need to change the API
> >>> of this function. (But I think you will also get a reduced
> >>> range if you try to use it across a boundary between normal
> >>> host-memory-backed RAM and a device MemoryRegion.)  
> >>
> >> If we allow a bounce buffer only to be used once (via the in_use flag), why
> >> do we allow only a single bounce buffer?
> >>
> >> Could address_space_map() allocate a new bounce buffer on every call and
> >> address_space_unmap() deallocate it?
> >>
> >> Isn't the design with a single bounce buffer bound to fail with a
> >> multi-threaded client as collision can be expected?  
> > 
> > See:
> > 
> > https://lore.kernel.org/r/20240212080617.2559498-1-mniss...@rivosinc.com
> > 
> > For some reason that series didn't land, but it seems to be helpful in this
> > case too if e.g. there can be multiple of such devices.
> > 
> > Thanks,
> >   
> 
> Hello Peter Xu,
> 
> thanks for pointing to your series. What I like about it is that it 
> removes the limit of a single bounce buffer per AddressSpace.
> 
> Unfortunately it does not solve my problem. You limit the sum of all of 
> the allocations for a single AddressSpcace to 
> DEFAULT_MAX_BOUNCE_BUFFER_SIZE = 4096 which is too small for my use case.

Agreed. For CXL memory (which indeed hits this with virtio-blk-pci for example)
I'm carrying a far from subtle patch [1] on top of the series linked above
(which I would very much like to land asap!)  Note I also need:

https://lore.kernel.org/qemu-devel/20240215142817.1904-1-jonathan.came...@huawei.com/#t
[PATCH 0/3] physmem: Fix MemoryRegion for second access to cached MMIO Address 
Space

otherwise the second 8 byte read comes from the wrong address space.
That used to accidentally trigger very large reads hence the rather excessive
size set below.  I haven't checked yet if a smaller size is fine as obviously
something as hacky as the below isn't going anywhere! Perhaps we need a rather 
more
global version of x-max-bounce-buffers from Mattias' series?

[1]
diff --git a/system/physmem.c b/system/physmem.c
index 0a838f1b33..10e9d8da86 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2554,6 +2554,7 @@ static void memory_map_init(void)
     memory_region_init(system_memory, NULL, "system", UINT64_MAX);
     address_space_init(&address_space_memory, system_memory, "memory");

+    address_space_memory.max_bounce_buffer_size = 1024 * 1024 * 1024;
     system_io = g_malloc(sizeof(*system_io));
     memory_region_init_io(system_io, NULL, &unassigned_io_ops, NULL, "io",
                           65536);


> 
> Why do we need a limit?
> Why is it so tiny?
> 
> Best regards
> 
> Heinrich
> 
> 
> 
> 


Reply via email to