On 21.04.2011, at 09:03, David Gibson wrote: > A few months ago, Eduard - Gabriel Munteanu posted a series of patches > implementing support for emulating the AMD PCI IOMMU > (http://lists.nongnu.org/archive/html/qemu-devel/2011-01/msg03196.html). > > In fact, this series implemented a general DMA/IOMMU layer which can > be used by any device model, and one translation backend for this > implementing the AMD specific PCI IOMMU. > > These patches don't seem to have gone anywhere for the last few > months, however, and so far I've been unable to contact the author > (trying again with this mail). > > I have an interest in this code, because the pSeries machine will also > need IOMMU emulation support. At present we only support virtual > devices, through the PAPR interface, and we have support for the > hypervisor-controller IOMMU translation in the PAPR VIO code. > However, we want to add PCI device support and this will also need > IOMMU translation. > > The series seems to have the right basic approach, so if the author is > indeed MIA, I was planning to pick up the patches and resubmit them > (with support for the pSeries IOMMU added). > > Before I do that, I was hoping to get some consensus that this is the > right way to go. For reference, I have an updated version of the > first patch (which adds the core IOMMU layer) below. > > From: Eduard - Gabriel Munteanu <eduard.munte...@linux360.ro> > Date: Fri, 4 Feb 2011 01:32:55 +0200 > Subject: [PATCH] Generic DMA memory access interface > > This introduces replacements for memory access functions like > cpu_physical_memory_read(). The new interface can handle address > translation and access checking through an IOMMU. > > David Gibson: I have made several bugfixes and cleanups to Eduard's > original patch. > > * dma_memory_rw() was incorrectly using (uninitialized) plen instead > of len in the fallback to no-IOMMU case. > > * the dma_memory_map() tracking was storing the guest physical > address of each mapping, but not the qemu user virtual address. > However in unmap() it was then attempting to lookup by virtual > using a completely bogus cast. > > * The dma_memory_rw() function is moved from dma_rw.h to dma_rw.c, it > was a bit too much code for an inline. > > * IOMMU support is now available on all target platforms, not just > i386, but is configurable (--enable-iommu/--disable-iommu). Stubs > are used so that individual drivers can use the new dma interface > and it will turn into old-style cpu physical accesses at no cost on > IOMMU-less builds. > > Signed-off-by: Eduard - Gabriel Munteanu <eduard.munte...@linux360.ro> > Signed-off-by: David Gibson <da...@gibson.dropbear.id.au> > --- > Makefile.target | 1 + > configure | 12 ++++ > hw/dma_rw.c | 147 +++++++++++++++++++++++++++++++++++++++++++++++++++ > hw/dma_rw.h | 156 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 316 insertions(+), 0 deletions(-) > create mode 100644 hw/dma_rw.c > create mode 100644 hw/dma_rw.h > > diff --git a/Makefile.target b/Makefile.target > index 95f5eda..c3d36c6 100644 > --- a/Makefile.target > +++ b/Makefile.target > @@ -198,6 +198,7 @@ obj-$(CONFIG_REALLY_VIRTFS) += virtio-9p.o > obj-y += rwhandler.o > obj-$(CONFIG_KVM) += kvm.o kvm-all.o > obj-$(CONFIG_NO_KVM) += kvm-stub.o > +obj-$(CONFIG_IOMMU) += dma_rw.o > LIBS+=-lz > > QEMU_CFLAGS += $(VNC_TLS_CFLAGS) > diff --git a/configure b/configure > index da2da04..fa6f4d5 100755 > --- a/configure > +++ b/configure > @@ -130,6 +130,7 @@ xen="" > linux_aio="" > attr="" > vhost_net="" > +iommu="no" > xfs="" > > gprof="no" > @@ -723,6 +724,10 @@ for opt do > ;; > --enable-vhost-net) vhost_net="yes" > ;; > + --enable-iommu) iommu="yes" > + ;; > + --disable-iommu) iommu="no" > + ;; > --disable-opengl) opengl="no" > ;; > --enable-opengl) opengl="yes" > @@ -934,6 +939,8 @@ echo " --enable-docs enable documentation > build" > echo " --disable-docs disable documentation build" > echo " --disable-vhost-net disable vhost-net acceleration support" > echo " --enable-vhost-net enable vhost-net acceleration support" > +echo " --disable-iommu disable IOMMU emulation support" > +echo " --enable-vhost-net enable IOMMU emulation support"
eeh? > echo " --enable-trace-backend=B Set trace backend" > echo " Available backends:" > $("$source_path"/scripts/tracetool --list-backends) > echo " --with-trace-file=NAME Full PATH,NAME of file to store traces" > @@ -2608,6 +2615,7 @@ echo "madvise $madvise" > echo "posix_madvise $posix_madvise" > echo "uuid support $uuid" > echo "vhost-net support $vhost_net" > +echo "IOMMU support $iommu" > echo "Trace backend $trace_backend" > echo "Trace output file $trace_file-<pid>" > echo "spice support $spice" > @@ -3412,6 +3420,10 @@ if test "$target_softmmu" = "yes" -a \( \ > echo "CONFIG_NEED_MMU=y" >> $config_target_mak > fi > > +if test "$iommu" = "yes" ; then > + echo "CONFIG_IOMMU=y" >> $config_target_mak > +fi > + > if test "$gprof" = "yes" ; then > echo "TARGET_GPROF=yes" >> $config_target_mak > if test "$target_linux_user" = "yes" ; then > diff --git a/hw/dma_rw.c b/hw/dma_rw.c > new file mode 100644 > index 0000000..627835c > --- /dev/null > +++ b/hw/dma_rw.c > @@ -0,0 +1,147 @@ > +/* > + * Generic DMA memory access interface. > + * > + * Copyright (c) 2011 Eduard - Gabriel Munteanu > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > copy > + * of this software and associated documentation files (the "Software"), to > deal > + * in the Software without restriction, including without limitation the > rights > + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell > + * copies of the Software, and to permit persons to whom the Software is > + * furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice shall be included in > + * all copies or substantial portions of the Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > FROM, > + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > + * THE SOFTWARE. > + */ > + > +#include "dma_rw.h" > +#include "range.h" > + > +void dma_memory_rw(DMADevice *dev, dma_addr_t addr, > + void *buf, dma_addr_t len, int is_write) > +{ > + dma_addr_t paddr, plen; > + int err; > + > + /* > + * Fast-path non-iommu. > + * More importantly, makes it obvious what this function does. > + */ > + if (NO_IOMMU(dev)) { > + cpu_physical_memory_rw(addr, buf, len, is_write); > + return; > + } > + > + while (len) { > + err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write); > + if (err) { > + return; > + } > + > + /* The translation might be valid for larger regions. */ > + if (plen > len) { > + plen = len; > + } > + > + cpu_physical_memory_rw(paddr, buf, plen, is_write); > + > + len -= plen; > + addr += plen; > + buf += plen; > + } > +} > + > +typedef struct DMAMemoryMap DMAMemoryMap; > +struct DMAMemoryMap { > + dma_addr_t addr; > + size_t len; > + void * buf; > + DMAInvalidateMapFunc *invalidate; > + void *invalidate_opaque; > + > + QLIST_ENTRY(DMAMemoryMap) list; > +}; > + > +void dma_invalidate_memory_range(DMADevice *dev, > + dma_addr_t addr, dma_addr_t len) > +{ > + DMAMemoryMap *map; > + > + QLIST_FOREACH(map, &dev->mmu->memory_maps, list) { > + if (ranges_overlap(addr, len, map->addr, map->len)) { > + map->invalidate(map->invalidate_opaque); > + QLIST_REMOVE(map, list); > + free(map); > + } > + } > +} > + > +void *dma_memory_map(DMADevice *dev, DMAInvalidateMapFunc *cb, void *opaque, > + dma_addr_t addr, dma_addr_t *len, int is_write) > +{ > + int err; > + target_phys_addr_t paddr, plen; > + void *buf; > + > + if (NO_IOMMU(dev)) { > + return cpu_physical_memory_map(addr, len, is_write); > + } > + > + plen = *len; > + err = dev->mmu->translate(dev, addr, &paddr, &plen, is_write); > + if (err) { > + return NULL; > + } > + > + /* > + * If this is true, the virtual region is contiguous, > + * but the translated physical region isn't. We just > + * clamp *len, much like cpu_physical_memory_map() does. > + */ > + if (plen < *len) { > + *len = plen; > + } > + > + buf = cpu_physical_memory_map(paddr, len, is_write); > + > + /* We treat maps as remote TLBs to cope with stuff like AIO. */ > + if (cb) { > + DMAMemoryMap *map; > + > + map = qemu_malloc(sizeof(DMAMemoryMap)); > + map->addr = addr; > + map->len = *len; > + map->buf = buf; > + map->invalidate = cb; > + map->invalidate_opaque = opaque; > + > + QLIST_INSERT_HEAD(&dev->mmu->memory_maps, map, list); > + } > + > + return buf; > +} > + > +void dma_memory_unmap(DMADevice *dev, void *buffer, dma_addr_t len, > + int is_write, dma_addr_t access_len) > +{ > + cpu_physical_memory_unmap(buffer, len, is_write, access_len); > + if (!NO_IOMMU(dev)) { > + DMAMemoryMap *map; > + > + QLIST_FOREACH(map, &dev->mmu->memory_maps, list) { > + if ((map->buf == buffer) && (map->len == len)) { > + QLIST_REMOVE(map, list); > + free(map); > + } > + } > + } > +} > + > diff --git a/hw/dma_rw.h b/hw/dma_rw.h > new file mode 100644 > index 0000000..58a3d0f > --- /dev/null > +++ b/hw/dma_rw.h > @@ -0,0 +1,156 @@ > +#ifndef DMA_RW_H > +#define DMA_RW_H > + > +#include "qemu-common.h" > + > +typedef uint64_t dma_addr_t; > + > +typedef struct DMAMmu DMAMmu; > +typedef struct DMADevice DMADevice; > + > +typedef int DMATranslateFunc(DMADevice *dev, > + dma_addr_t addr, > + dma_addr_t *paddr, > + dma_addr_t *len, > + int is_write); > + > +typedef void DMAInvalidateMapFunc(void *); > + > +#ifndef CONFIG_IOMMU > +struct DMAMmu { > +}; > + > +struct DMADevice { > +}; > + > +#define NO_IOMMU(_dev) (1) > + > +static inline void dma_memory_rw(DMADevice *dev, dma_addr_t addr, > + void *buf, dma_addr_t len, int is_write) > +{ > + cpu_physical_memory_rw(addr, buf, len, is_write); > +} > + > +static inline void *dma_memory_map(DMADevice *dev, > + DMAInvalidateMapFunc *cb, void *opaque, > + dma_addr_t addr, dma_addr_t *len, > + int is_write) > +{ > + return cpu_physical_memory_map(addr, len, is_write); > +} > + > +static inline void dma_memory_unmap(DMADevice *dev, > + void *buffer, dma_addr_t len, > + int is_write, dma_addr_t access_len) > +{ > + cpu_physical_memory_unmap(buffer, len, is_write, access_len); > +} > + > +#define DEFINE_DMA_LD(suffix, size) \ > +static inline uint##size##_t \ > +dma_ld##suffix(DMADevice *dev, dma_addr_t addr) \ > +{ \ > + return ld##suffix##_phys(addr); \ > +} > + > +#define DEFINE_DMA_ST(suffix, size) \ > +static inline void \ > +dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val) \ > +{ \ > + st##suffix##_phys(addr, val); \ > +} > + > +#else > +struct DMAMmu { > + DeviceState *iommu; > + DMATranslateFunc *translate; > + QLIST_HEAD(memory_maps, DMAMemoryMap) memory_maps; > +}; > + > +struct DMADevice { How exactly is this going to be used? Also, in the end I think that most devices should just go through a PCI specific interface that then calls the DMA helpers: pci_memory_rw(PCIDevice *d, ...) even if it's only as simple as calling dma_memory_rw(d->iommu, ...) Alex