On Thu, 2014-01-09 at 11:47 -0700, Alex Williamson wrote: > On Thu, 2014-01-09 at 20:00 +0200, Michael S. Tsirkin wrote: > > On Thu, Jan 09, 2014 at 10:24:47AM -0700, Alex Williamson wrote: > > > On Wed, 2013-12-11 at 20:30 +0200, Michael S. Tsirkin wrote: > > > > From: Paolo Bonzini <pbonz...@redhat.com> > > > > > > > > As an alternative to commit 818f86b (exec: limit system memory > > > > size, 2013-11-04) let's just make all address spaces 64-bit wide. > > > > This eliminates problems with phys_page_find ignoring bits above > > > > TARGET_PHYS_ADDR_SPACE_BITS and address_space_translate_internal > > > > consequently messing up the computations. > > > > > > > > In Luiz's reported crash, at startup gdb attempts to read from address > > > > 0xffffffffffffffe6 to 0xffffffffffffffff inclusive. The region it gets > > > > is the newly introduced master abort region, which is as big as the PCI > > > > address space (see pci_bus_init). Due to a typo that's only 2^63-1, > > > > not 2^64. But we get it anyway because phys_page_find ignores the upper > > > > bits of the physical address. In address_space_translate_internal then > > > > > > > > diff = int128_sub(section->mr->size, int128_make64(addr)); > > > > *plen = int128_get64(int128_min(diff, int128_make64(*plen))); > > > > > > > > diff becomes negative, and int128_get64 booms. > > > > > > > > The size of the PCI address space region should be fixed anyway. > > > > > > > > Reported-by: Luiz Capitulino <lcapitul...@redhat.com> > > > > Signed-off-by: Paolo Bonzini <pbonz...@redhat.com> > > > > Signed-off-by: Michael S. Tsirkin <m...@redhat.com> > > > > --- > > > > exec.c | 8 ++------ > > > > 1 file changed, 2 insertions(+), 6 deletions(-) > > > > > > > > diff --git a/exec.c b/exec.c > > > > index 7e5ce93..f907f5f 100644 > > > > --- a/exec.c > > > > +++ b/exec.c > > > > @@ -94,7 +94,7 @@ struct PhysPageEntry { > > > > #define PHYS_MAP_NODE_NIL (((uint32_t)~0) >> 6) > > > > > > > > /* Size of the L2 (and L3, etc) page tables. */ > > > > -#define ADDR_SPACE_BITS TARGET_PHYS_ADDR_SPACE_BITS > > > > +#define ADDR_SPACE_BITS 64 > > > > > > > > #define P_L2_BITS 10 > > > > #define P_L2_SIZE (1 << P_L2_BITS) > > > > @@ -1861,11 +1861,7 @@ static void memory_map_init(void) > > > > { > > > > system_memory = g_malloc(sizeof(*system_memory)); > > > > > > > > - assert(ADDR_SPACE_BITS <= 64); > > > > - > > > > - memory_region_init(system_memory, NULL, "system", > > > > - ADDR_SPACE_BITS == 64 ? > > > > - UINT64_MAX : (0x1ULL << ADDR_SPACE_BITS)); > > > > + memory_region_init(system_memory, NULL, "system", UINT64_MAX); > > > > address_space_init(&address_space_memory, system_memory, "memory"); > > > > > > > > system_io = g_malloc(sizeof(*system_io)); > > > > > > This seems to have some unexpected consequences around sizing 64bit PCI > > > BARs that I'm not sure how to handle. > > > > BARs are often disabled during sizing. Maybe you > > don't detect BAR being disabled? > > See the trace below, the BARs are not disabled. QEMU pci-core is doing > the sizing an memory region updates for the BARs, vfio is just a > pass-through here.
Sorry, not in the trace below, but yes the sizing seems to be happening while I/O & memory are enabled int he command register. Thanks, Alex > > > After this patch I get vfio > > > traces like this: > > > > > > vfio: vfio_pci_read_config(0000:01:10.0, @0x10, len=0x4) febe0004 > > > (save lower 32bits of BAR) > > > vfio: vfio_pci_write_config(0000:01:10.0, @0x10, 0xffffffff, len=0x4) > > > (write mask to BAR) > > > vfio: region_del febe0000 - febe3fff > > > (memory region gets unmapped) > > > vfio: vfio_pci_read_config(0000:01:10.0, @0x10, len=0x4) ffffc004 > > > (read size mask) > > > vfio: vfio_pci_write_config(0000:01:10.0, @0x10, 0xfebe0004, len=0x4) > > > (restore BAR) > > > vfio: region_add febe0000 - febe3fff [0x7fcf3654d000] > > > (memory region re-mapped) > > > vfio: vfio_pci_read_config(0000:01:10.0, @0x14, len=0x4) 0 > > > (save upper 32bits of BAR) > > > vfio: vfio_pci_write_config(0000:01:10.0, @0x14, 0xffffffff, len=0x4) > > > (write mask to BAR) > > > vfio: region_del febe0000 - febe3fff > > > (memory region gets unmapped) > > > vfio: region_add fffffffffebe0000 - fffffffffebe3fff [0x7fcf3654d000] > > > (memory region gets re-mapped with new address) > > > qemu-system-x86_64: vfio_dma_map(0x7fcf38861710, 0xfffffffffebe0000, > > > 0x4000, 0x7fcf3654d000) = -14 (Bad address) > > > (iommu barfs because it can only handle 48bit physical addresses) > > > > > > > Why are you trying to program BAR addresses for dma in the iommu? > > Two reasons, first I can't tell the difference between RAM and MMIO. > Second, it enables peer-to-peer DMA between devices, which is something > that we might be able to take advantage of with GPU passthrough. > > > > Prior to this change, there was no re-map with the fffffffffebe0000 > > > address, presumably because it was beyond the address space of the PCI > > > window. This address is clearly not in a PCI MMIO space, so why are we > > > allowing it to be realized in the system address space at this location? > > > Thanks, > > > > > > Alex > > > > Why do you think it is not in PCI MMIO space? > > True, CPU can't access this address but other pci devices can. > > What happens on real hardware when an address like this is programmed to > a device? The CPU doesn't have the physical bits to access it. I have > serious doubts that another PCI device would be able to access it > either. Maybe in some limited scenario where the devices are on the > same conventional PCI bus. In the typical case, PCI addresses are > always limited by some kind of aperture, whether that's explicit in > bridge windows or implicit in hardware design (and perhaps made explicit > in ACPI). Even if I wanted to filter these out as noise in vfio, how > would I do it in a way that still allows real 64bit MMIO to be > programmed. PCI has this knowledge, I hope. VFIO doesn't. Thanks, > > Alex