On Mon, 19 Aug 2024 at 14:56, Mattias Nissler <mniss...@rivosinc.com> wrote: > > When DMA memory can't be directly accessed, as is the case when > running the device model in a separate process without shareable DMA > file descriptors, bounce buffering is used. > > It is not uncommon for device models to request mapping of several DMA > regions at the same time. Examples include: > * net devices, e.g. when transmitting a packet that is split across > several TX descriptors (observed with igb) > * USB host controllers, when handling a packet with multiple data TRBs > (observed with xhci) > > Previously, qemu only provided a single bounce buffer per AddressSpace > and would fail DMA map requests while the buffer was already in use. In > turn, this would cause DMA failures that ultimately manifest as hardware > errors from the guest perspective. > > This change allocates DMA bounce buffers dynamically instead of > supporting only a single buffer. Thus, multiple DMA mappings work > correctly also when RAM can't be mmap()-ed. > > The total bounce buffer allocation size is limited individually for each > AddressSpace. The default limit is 4096 bytes, matching the previous > maximum buffer size. A new x-max-bounce-buffer-size parameter is > provided to configure the limit for PCI devices. > > Signed-off-by: Mattias Nissler <mniss...@rivosinc.com> > Reviewed-by: Philippe Mathieu-Daudé <phi...@linaro.org> > Acked-by: Peter Xu <pet...@redhat.com> > ---
> @@ -3251,28 +3265,40 @@ void *address_space_map(AddressSpace *as, > mr = flatview_translate(fv, addr, &xlat, &l, is_write, attrs); > > if (!memory_access_is_direct(mr, is_write)) { > - if (qatomic_xchg(&as->bounce.in_use, true)) { > + size_t used = qatomic_read(&as->bounce_buffer_size); > + for (;;) { > + hwaddr alloc = MIN(as->max_bounce_buffer_size - used, l); > + size_t new_size = used + alloc; > + size_t actual = > + qatomic_cmpxchg(&as->bounce_buffer_size, used, new_size); > + if (actual == used) { > + l = alloc; > + break; > + } > + used = actual; > + } > + > + if (l == 0) { > *plen = 0; > return NULL; > } > - /* Avoid unbounded allocations */ > - l = MIN(l, TARGET_PAGE_SIZE); > - as->bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, l); > - as->bounce.addr = addr; > - as->bounce.len = l; > > + BounceBuffer *bounce = g_malloc0(l + sizeof(BounceBuffer)); > + bounce->magic = BOUNCE_BUFFER_MAGIC; > memory_region_ref(mr); > - as->bounce.mr = mr; > + bounce->mr = mr; > + bounce->addr = addr; > + bounce->len = l; > + > if (!is_write) { > flatview_read(fv, addr, MEMTXATTRS_UNSPECIFIED, > - as->bounce.buffer, l); > + bounce->buffer, l); > } > > *plen = l; > - return as->bounce.buffer; > + return bounce->buffer; Coverity is pretty unhappy about this trick, because it isn't able to recognise that we can figure out the address of 'bounce' from the address of 'bounce->buffer' and free it in the address_space_unmap() code, so it thinks that every use of address_space_map(), pci_dma_map(), etc, is a memory leak. We can mark all those as false positives, of course, but it got me wondering whether maybe we should have this function return a struct that has all the information address_space_unmap() needs rather than relying on it being able to figure it out from the host memory pointer... -- PMM