On Sun, Jun 04, 2017 at 06:34:45PM +0800, Wei Wang wrote: > On 05/26/2017 01:57 AM, Michael S. Tsirkin wrote: > > > > I think that's a very valid point. Linux isn't currently optimized to > > handle packets in device BAR. > > There are several issues here and you do need to address them in the > > kernel, no way around that: > > > > 1. lots of drivers set protection to > > vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); > > > Sorry for my late reply. > > In the implementation tests, I didn't find an issue when letting the > guest directly access the bar MMIO returned by ioremap_cache(). > If that's conventionally improper, we can probably make a new > function similar to ioremap_cache, as the 2nd comment suggests > below.
Right. And just disable the driver on architectures that don't support it. > So, in any case, the vhost-pci driver uses ioremap_cache() or a similar > function, which sets the memory type to WB. > And that's great. AFAIK VFIO doesn't though, you will need to teach it to do that to use userspace drivers. > > > vfio certainly does, and so I think does pci sysfs. > > You won't get good performance with this, you want to use > > a cacheable mapping. > > This needs to be addressed for pmd to work well. > > In case it's useful for the discussion here, introduce a little background > about how the bar MMIO is used in vhost-pci: > The device in QEMU sets up the MemoryRegion of the bar as "ram" type, > which will finally have translation mappings created in EPT. So, the memory > setup of the bar is the same as adding a regular RAM. It's like we are > passing through a bar memory to the guest which allows the guest to > directly access to the bar memory. > > Back to the comments, why it is not cacheable memory when the > vhost-pci driver explicitly uses ioremap_cache()? It is. But when you write a userspace driver, you will need to teach vfio to allow cacheable access from userspace. > > > > 2. linux mostly assumes PCI BAR isn't memory, ioremap_cache returns __iomem > > pointers which aren't supposed to be dereferenced directly. > > You want a new API that does direct remap or copy if not possible. > > Alternatively remap or fail, kind of like pci_remap_iospace. > > Maybe there's already something like that - I'm not sure. > > > > For the vhost-pci case, the bar is known to be a portion physical memory. Yes but AFAIK __iomem mappings still can't be portably dereferenced on all architectures. ioremap_cache simply doesn't always give you a dereferencable address. > So, in this case, would it be an issue if the driver directly accesses to > it? > (as mentioned above, the implementation functions correctly) > > Best, > Wei you mean like this: void __iomem *baseptr = ioremap_cache(....); unsigned long signature = *(unsigned int *)baseptr; It works on intel. sparse will complain though. See Documentation/bus-virt-phys-mapping.txt -- MST