On 06/05/2014 11:36 PM, Alexander Graf wrote: > > On 05.06.14 15:33, Alexey Kardashevskiy wrote: >> On 06/05/2014 11:15 PM, Alexander Graf wrote: >>> On 05.06.14 15:10, Alexey Kardashevskiy wrote: >>>> On 06/05/2014 11:06 PM, Alexander Graf wrote: >>>>> On 05.06.14 08:43, Alexey Kardashevskiy wrote: >>>>>> On 06/05/2014 03:49 PM, Alexey Kardashevskiy wrote: >>>>>>> POWER KVM supports an KVM_CAP_SPAPR_TCE capability which allows >>>>>>> allocating >>>>>>> TCE tables in the host kernel memory and handle H_PUT_TCE requests >>>>>>> targeted to specific LIOBN (logical bus number) right in the host >>>>>>> without >>>>>>> switching to QEMU. At the moment this is used for emulated devices only >>>>>>> and the handler only puts TCE to the table. If the in-kernel H_PUT_TCE >>>>>>> handler finds a LIOBN and corresponding table, it will put a TCE to >>>>>>> the table and complete hypercall execution. The user space will not be >>>>>>> notified. >>>>>>> >>>>>>> Upcoming VFIO support is going to use the same sPAPRTCETable device >>>>>>> class >>>>>>> so KVM_CAP_SPAPR_TCE is going to be used as well. That means that TCE >>>>>>> tables for VFIO are going to be allocated in the host as well. >>>>>>> However VFIO operates with real IOMMU tables and simple copying of >>>>>>> a TCE to the real hardware TCE table will not work as guest physical >>>>>>> to host physical address translation is requited. >>>>>>> >>>>>>> So until the host kernel gets VFIO support for H_PUT_TCE, we better not >>>>>>> to register VFIO's TCE in the host. >>>>>>> >>>>>>> This adds a bool @kvm_accel flag to the sPAPRTCETable device telling >>>>>>> that sPAPRTCETable should not try allocating TCE table in the host >>>>>>> kernel. >>>>>>> Instead, the table will be created in QEMU. >>>>>>> >>>>>>> This adds an kvm_accel parameter to spapr_tce_new_table() to let users >>>>>>> choose whether to use acceleration or not. At the moment it is enabled >>>>>>> for VIO and emulated PCI. Upcoming VFIO support will set it to false. >>>>>>> >>>>>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> >>>>>>> --- >>>>>>> >>>>>>> This is a workaround but it lets me have one IOMMU device for VIO, >>>>>>> emulated >>>>>>> PCI and VFIO which is a good thing. >>>>>>> >>>>>>> The other way around would be a new KVM_CAP_SPAPR_TCE_VFIO >>>>>>> capability but >>>>>>> this needs kernel update. >>>>>> Never mind, I'll make it a capability. I'll post capability reservation >>>>>> patch separately. >>>>> Just rename the flag from "kvm_accel" to "vfio_accel", set it to true for >>>>> vfio and false for emulated devices. Then the spapr_iommu file can >>>>> check on >>>>> the capability (and default to false for now, since it doesn't exist >>>>> yet). >>>> Is that ok if the flag does not have to do anything with VFIO per se? :) >>> The flag means "use in-kernel acceleration if the vfio coupling capability >>> is available", no? >> It is a flag of sPAPRTCETable which is not supposed to know about VFIO at >> all, it is just an IOMMU. But if you are ok with it, I have no reason to be >> unhappy either :) >> >> >> >>>>> That way you don't have to reserve a CAP today. >>>> Why exactly cannot we do that today? >>> Because the CAP namespace isn't a garbage bin we can just throw IDs at. >>> Maybe we realize during patch review that we need completely different >>> CAPs. >> That was my first plan - to wait for KVM_CAP_SPAPR_TCE_64 be available in >> the kernel. > > So all you need are 64bit TCEs with bus_offset?
No. I need 64bit IOBAs a.k.a. PCI bus addresses. The default DMA window is just 1 or 2GB and it is mapped at 0 on PCI bus. TCEs are 64 bit already. > What about the missing > in-kernel modification of the shadow TCEs on H_PUT_TCE? I thought that's > what this is really about. This I do not understand :( -- Alexey