On 13/10/2016 01:57, David Gibson wrote: > Currently, the MMIO space for accessing PCI on pseries guests begins at > 1 TiB in guest address space. Each PCI host bridge (PHB) has a 64 GiB > chunk of address space in which it places its outbound PIO and 32-bit and > 64-bit MMIO windows. > > This scheme as several problems: > - It limits guest RAM to 1 TiB (though we have a limited fix for this > now) > - It limits the total MMIO window to 64 GiB. This is not always enough > for some of the large nVidia GPGPU cards > - Putting all the windows into a single 64 GiB area means that naturally > aligning things within there will waste more address space. > In addition there was a miscalculation in some of the defaults, which meant > that the MMIO windows for each PHB actually slightly overran the 64 GiB > region for that PHB. We got away without nasty consequences because > the overrun fit within an unused area at the beginning of the next PHB's > region, but it's not pretty. > > This patch implements a new scheme which addresses those problems, and is > also closer to what bare metal hardware and pHyp guests generally use. > > Because some guest versions (including most current distro kernels) can't > access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and > 64 TiB. This is broken into 1 TiB chunks. The 1 TiB contains the PIO > (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs. Each > subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for > one PHB each. > > This reduces the number of allowed PHBs (without full manual configuration > of all the windows) from 256 to 31, but this should still be plenty in > practice. > > We also change some of the default window sizes for manually configured > PHBs to saner values. > > Finally we adjust some tests and libqos so that it correctly uses the new > default locations. Ideally it would parse the device tree given to the > guest, but that's a more complex problem for another time. > > Signed-off-by: David Gibson <da...@gibson.dropbear.id.au> > --- > hw/ppc/spapr.c | 126 > +++++++++++++++++++++++++++++++++----------- > hw/ppc/spapr_pci.c | 5 +- > include/hw/pci-host/spapr.h | 8 ++- > tests/endianness-test.c | 3 +- > tests/libqos/pci-spapr.c | 9 ++-- > tests/spapr-phb-test.c | 2 +- > 6 files changed, 113 insertions(+), 40 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 8db3657..2d952a8 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -2375,31 +2375,42 @@ static void spapr_phb_placement(sPAPRMachineState > *spapr, uint32_t index, > hwaddr *mmio32, hwaddr *mmio64, > unsigned n_dma, uint32_t *liobns, Error > **errp) > { > + /* > + * New-style PHB window placement. > + * > + * Goals: Gives large (1TiB), naturally aligned 64-bit MMIO window > + * for each PHB, in addition to 2GiB 32-bit MMIO and 64kiB PIO > + * windows. > + * > + * Some guest kernels can't work with MMIO windows above 1<<46 > + * (64TiB), so we place up to 31 PHBs in the area 32TiB..64TiB > + * > + * 32TiB..33TiB contains the PIO and 32-bit MMIO windows for all > + * PHBs. 33..34TiB has the 64-bit MMIO window for PHB0, 34..35 > + * has the 64-bit window for PHB1 and so forth. > + */ > const uint64_t base_buid = 0x800000020000000ULL; > - const hwaddr phb_spacing = 0x1000000000ULL; /* 64 GiB */ > - const hwaddr mmio_offset = 0xa0000000; /* 2 GiB + 512 MiB */ > - const hwaddr pio_offset = 0x80000000; /* 2 GiB */ > - const uint32_t max_index = 255; > - const hwaddr phb0_alignment = 0x10000000000ULL; /* 1 TiB */ > > - uint64_t ram_top = MACHINE(spapr)->ram_size; > - hwaddr phb0_base, phb_base; > + int max_phbs = > + (SPAPR_PCI_LIMIT - SPAPR_PCI_BASE) / SPAPR_PCI_MEM64_WIN_SIZE - 1; > + hwaddr mmio32_base = SPAPR_PCI_BASE + SPAPR_PCI_MEM32_WIN_SIZE;
The result is the same but I would add SPAPR_PCI_MEM_WIN_BUS_OFFSET instead of SPAPR_PCI_MEM32_WIN_SIZE. As SPAPR_PCI_MEM32_WIN_SIZE is defined as "((1ULL << 32) - SPAPR_PCI_MEM_WIN_BUS_OFFSET)", I guess 0..SPAPR_PCI_MEM_WIN_BUS_OFFSET is for PIO and SPAPR_PCI_MEM_WIN_BUS_OFFSET..(1<<32) is for MMIO. This is what we have below with: *pio = SPAPR_PCI_BASE + index * SPAPR_PCI_IO_WIN_SIZE; *mmio32 = mmio32_base + index * SPAPR_PCI_MEM32_WIN_SIZE; Perhaps we can see *mmio32 as SPAPR_PCI_BASE + (index + 1) * SPAPR_PCI_MEM32_WIN_SIZE But in this case you should directly defined SPAPR_PCI_MEM32_WIN_SIZE as 0x80000000, not as "SIZE - OFFSET". It's confusing... If it is only a misunderstanding from my side, you can add my: Reviewed-by: Laurent Vivier <lviv...@redhat.com> Thanks, Laurent