On 08/12/2014 07:37 PM, Alexander Graf wrote: > > On 12.08.14 02:03, Alexey Kardashevskiy wrote: >> On 08/12/2014 03:30 AM, Alexander Graf wrote: >>> On 11.08.14 17:01, Alexey Kardashevskiy wrote: >>>> On 08/11/2014 10:02 PM, Alexander Graf wrote: >>>>> On 31.07.14 11:34, Alexey Kardashevskiy wrote: >>>>>> This implements DDW for VFIO. Host kernel support is required for this. >>>>>> >>>>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> >>>>>> --- >>>>>> hw/ppc/spapr_pci_vfio.c | 75 >>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>> 1 file changed, 75 insertions(+) >>>>>> >>>>>> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c >>>>>> index d3bddf2..dc443e2 100644 >>>>>> --- a/hw/ppc/spapr_pci_vfio.c >>>>>> +++ b/hw/ppc/spapr_pci_vfio.c >>>>>> @@ -69,6 +69,77 @@ static void >>>>>> spapr_phb_vfio_finish_realize(sPAPRPHBState *sphb, Error **errp) >>>>>> /* Register default 32bit DMA window */ >>>>>> memory_region_add_subregion(&sphb->iommu_root, tcet->bus_offset, >>>>>> spapr_tce_get_iommu(tcet)); >>>>>> + >>>>>> + sphb->ddw_supported = !!(info.flags & >>>>>> VFIO_IOMMU_SPAPR_TCE_FLAG_DDW); >>>>>> +} >>>>>> + >>>>>> +static int spapr_pci_vfio_ddw_query(sPAPRPHBState *sphb, >>>>>> + uint32_t *windows_available, >>>>>> + uint32_t *page_size_mask) >>>>>> +{ >>>>>> + sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb); >>>>>> + struct vfio_iommu_spapr_tce_query query = { .argsz = >>>>>> sizeof(query) }; >>>>>> + int ret; >>>>>> + >>>>>> + ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid, >>>>>> + VFIO_IOMMU_SPAPR_TCE_QUERY, &query); >>>>>> + if (ret) { >>>>>> + return ret; >>>>>> + } >>>>>> + >>>>>> + *windows_available = query.windows_available; >>>>>> + *page_size_mask = query.page_size_mask; >>>>>> + >>>>>> + return ret; >>>>>> +} >>>>>> + >>>>>> +static int spapr_pci_vfio_ddw_create(sPAPRPHBState *sphb, uint32_t >>>>>> page_shift, >>>>>> + uint32_t window_shift, uint32_t >>>>>> liobn, >>>>>> + sPAPRTCETable **ptcet) >>>>>> +{ >>>>>> + sPAPRPHBVFIOState *svphb = SPAPR_PCI_VFIO_HOST_BRIDGE(sphb); >>>>>> + struct vfio_iommu_spapr_tce_create create = { >>>>>> + .argsz = sizeof(create), >>>>>> + .page_shift = page_shift, >>>>>> + .window_shift = window_shift, >>>>>> + .start_addr = 0 >>>>>> + }; >>>>>> + int ret; >>>>>> + >>>>>> + ret = vfio_container_ioctl(&sphb->iommu_as, svphb->iommugroupid, >>>>>> + VFIO_IOMMU_SPAPR_TCE_CREATE, &create); >>>>>> + if (ret) { >>>>>> + return ret; >>>>>> + } >>>>>> + >>>>>> + *ptcet = spapr_tce_new_table(DEVICE(sphb), liobn, >>>>>> create.start_addr, >>>>>> + page_shift, 1 << (window_shift - >>>>>> page_shift), >>>>> I spot a 1 without ULL again - this time it might work out ok, but please >>>>> just always use ULL when you pass around addresses. >>>> My bad. I keep forgetting this, I'll adjust my own checkpatch.py :) >>>> >>>> >>>>> Please walk me though the abstraction levels on what each page size >>>>> honoration means. If I use THP, what page size granularity can I use for >>>>> TCE entries? >>>> [RFC PATCH 06/10] spapr_rtas: Add Dynamic DMA windows (DDW) RTAS calls >>>> support >>>> >>>> + const struct { int shift; uint32_t mask; } masks[] = { >>>> + { 12, DDW_PGSIZE_4K }, >>>> + { 16, DDW_PGSIZE_64K }, >>>> + { 24, DDW_PGSIZE_16M }, >>>> + { 25, DDW_PGSIZE_32M }, >>>> + { 26, DDW_PGSIZE_64M }, >>>> + { 27, DDW_PGSIZE_128M }, >>>> + { 28, DDW_PGSIZE_256M }, >>>> + { 34, DDW_PGSIZE_16G }, >>>> + }; >>>> >>>> >>>> Supported page sizes are returned by the host kernel via "query". For 16MB >>>> pages, page shift will return DDW_PGSIZE_4K|DDW_PGSIZE_64K|DDW_PGSIZE_16M. >>>> Or I did not understand the question... >>> Why do we care about the sizes? Anything bigger than what we support should >>> always work, no? What happens if the guest creates a 16MB map but my pages >>> are 4kb mapped? Wouldn't the same logic be able to deal with 16G pages? >> It is DMA memory, if I split "virtual" 16M page to a bunch of real 4K >> pages, I have to make sure these 16M are continuous - there will be one TCE >> entry for it and no more translations besides IOMMU. What do I miss now? > > Who does the shadow translation where? Does it exist at all?
IOMMU? I am not sure I am following you... This IOMMU will look as direct DMA for the guest but the real IOMMU table is sparse and it is populated via a bunch of H_PUT_TCE calls as the default small window. There is a direct mapping in the host called "bypass window" but it is not used here as sPAPR does not define that for paravirtualization. -- Alexey