On Thu, 2013-02-14 at 08:22 +0000, Bhushan Bharat-R65777 wrote: > Hi Alex Williamson, > > I am able (with some hacks :)) to directly assign the e1000 PCI device > to KVM guest using VFIO on Freescale device. One of the problem I am > facing is about the DMA mapping in IOMMU for PCI device BARs. On > Freescale devices, the mmaped PCI device BARs are not required to be > mapped in IOMMU. Typically the flow of in/out transaction (from CPU) > is: > > Incoming flow: > > -----| |----------| |---------------| > |-------------| > CORE |<----<------<-----<--| IOMMU |<---<---<| > PCI-Controller|<------<-----<----<| PCI device | > -----| |----------| |---------------| > |-------------| > > Outgoing Flow: IOMMU is bypassed for out transactions > > -----| |----------| |---------------| > |-------------| > CORE |>---->------>----| | IOMMU | ->-->| > PCI-Controller|>------>----->---->| PCI device | > -----| | |----------| ^ |---------------| > |-------------| > | | > |------------------|
Mapping the BAR into the IOMMU isn't for this second use case, CPU to device is never IOMMU translated. Instead, it's for this: |----------| |---------------| |-------------| | IOMMU |<---<---<| PCI-Controller|<------<-----<----<| PCI device | |----------| |---------------| |-------------| | | |---------------| |-------------| +-->---->--- ->| PCI-Controller|>------>----->---->| PCI device | |---------------| |-------------| It's perfectly fine to not support peer-to-peer DMA, but let's skip it where it's not supported in case it might actually work in some cases. > Also because of some hardware limitations on our IOMMU it is difficult > to map these BAR regions with RAM (DDR) regions. So on Freescale > device we want the VFIO_IOMMU_MAP_DMA ioctl to be called for RAM > regions (DDR) only and _not_ for these mmaped ram regions of PCI > device bars. I can understand that we need to add these mmaped PCI > bars as RAM type in qemu memory_region_*(). So for that I tried to > skip these regions in VFIO memory_listeners. Below changes which works > for me. I am not sure whether this is correct approach, please > suggest. > > ------------- > diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c > index c51ae67..63728d8 100644 > --- a/hw/vfio_pci.c > +++ b/hw/vfio_pci.c > @@ -1115,9 +1115,35 @@ static int vfio_dma_map(VFIOContainer *container, > hwaddr iova, > return -errno; > } > > -static bool vfio_listener_skipped_section(MemoryRegionSection *section) > +static int memory_region_is_mmap_bars(VFIOContainer *container, > + MemoryRegionSection *section) > { > - return !memory_region_is_ram(section->mr); > + VFIOGroup *group; > + VFIODevice *vdev; > + int i; > + > + QLIST_FOREACH(group, &container->group_list, next) { I think you want to start at the global &group_list > + QLIST_FOREACH(vdev, &group->device_list, next) { > + if (vdev->msix->mmap_mem.ram_addr == section->mr->ram_addr) Note the comment in memory.h: struct MemoryRegion { /* All fields are private - violators will be prosecuted */ ... ram_addr_t ram_addr; You'll need to invent some kind of memory region comparison interfaces instead of accessing these private fields. > + return 1; > + for (i = 0; i < PCI_ROM_SLOT; i++) { > + VFIOBAR *bar = &vdev->bars[i]; > + if (bar->mmap_mem.ram_addr == section->mr->ram_addr) > + return 1; > + } > + } > + } > + > + return 0; > +} It's a pretty heavy weight function, I think the memory API should help us differentiate MMIO from RAM. > +static bool vfio_listener_skipped_section(VFIOContainer *container, > + MemoryRegionSection *section) > +{ > + if (!memory_region_is_ram(section->mr)) > + return 1; Need some kind of conditional here for platforms that support peer-to-peer. Thanks, Alex > + return memory_region_is_mmap_bars(container, section); > } > > static void vfio_listener_region_add(MemoryListener *listener, > @@ -1129,7 +1155,7 @@ static void vfio_listener_region_add(MemoryListener > *listener, > void *vaddr; > int ret; > > - if (vfio_listener_skipped_section(section)) { > + if (vfio_listener_skipped_section(container, section)) { > DPRINTF("vfio: SKIPPING region_add %"HWADDR_PRIx" - %"PRIx64"\n", > section->offset_within_address_space, > section->offset_within_address_space + section->size - 1); > @@ -1173,7 +1199,7 @@ static void vfio_listener_region_del(MemoryListener > *listener, > hwaddr iova, end; > int ret; > - if (vfio_listener_skipped_section(section)) { > + if (vfio_listener_skipped_section(container, section)) { > DPRINTF("vfio: SKIPPING region_del %"HWADDR_PRIx" - %"PRIx64"\n", > section->offset_within_address_space, > section->offset_within_address_space + section->size - 1); > > ----------------- > > > Thanks > -Bharat >