On Mon, Mar 30, 2020 at 10:59:23PM +0800, Alex Williamson wrote: > On Mon, 30 Mar 2020 02:34:02 -0400 > Yan Zhao <yan.y.z...@intel.com> wrote: > > > On Mon, Mar 30, 2020 at 09:35:27AM +0800, Yan Zhao wrote: > > > On Sat, Mar 28, 2020 at 01:25:37AM +0800, Alex Williamson wrote: > > > > On Fri, 27 Mar 2020 11:19:34 +0000 > > > > yan.y.z...@intel.com wrote: > > > > > > > > > From: Yan Zhao <yan.y.z...@intel.com> > > > > > > > > > > currently, vfio regions without VFIO_REGION_INFO_FLAG_WRITE are only > > > > > read-only when VFIO_REGION_INFO_FLAG_MMAP is not set. > > > > > > > > > > regions with flag VFIO_REGION_INFO_FLAG_READ | > > > > > VFIO_REGION_INFO_FLAG_MMAP > > > > > are only read-only in host page table for qemu. > > > > > > > > > > This patch sets corresponding ept page entries read-only for regions > > > > > with flag VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_MMAP. > > > > > > > > > > accordingly, it ignores guest write when guest writes to the read-only > > > > > regions are trapped. > > > > > > > > > > Signed-off-by: Yan Zhao <yan.y.z...@intel.com> > > > > > Signed-off-by: Xin Zeng <xin.z...@intel.com> > > > > > --- > > > > > > > > Currently we set the r/w protection on the mmap, do I understand > > > > correctly that the change in the vfio code below results in KVM exiting > > > > to QEMU to handle a write to a read-only region and therefore we need > > > > the memory.c change to drop the write? This prevents a SIGBUS or > > > > similar? > > > yes, correct. the change in memory.c is to prevent a SIGSEGV in host as > > > it's mmaped to read-only. we think it's better to just drop the writes > > > from guest rather than corrupt the qemu. > > > > > > > > > > > Meanwhile vfio_region_setup() uses the same vfio_region_ops for all > > > > regions and vfio_region_write() would still allow writes, so if the > > > > device were using x-no-mmap=on, I think we'd still get a write to this > > > > region and expect the vfio device to drop it. Should we prevent that > > > > write in QEMU as well? > > > yes, it expects vfio device to drop it right now. > > > As the driver sets the flag without VFIO_REGION_INFO_FLAG_WRITE, it should > > > handle it properly. > > > both dropping in qemu and dropping in vfio device are fine to us. > > > we wonder which one is your preference :) > > The kernel and device should always do the right thing, we cannot rely > on the user to honor the mapping, but it's also a reasonable response > from the kernel to kill the process with a SIGSEGV if the user ignores > the protections. So I don't think it's an either/or, the kernel needs > to do the right thing for itself and in this case QEMU should do the > right thing for itself, which is to drop writes for regions that don't > support it. So in general, I agree with your patch. > hi Alex so is there anything I need to do? do I need to add a write dropping in vfio_region_write() too? if yes, do I need to keep the trace_vfio_region_write() before dropping ?
Thanks Yan > > > > Can you also identify what device and region requires this so that we > > > > can decide whether this is QEMU 5.0 or 5.1 material? PCI BARs are of > > > > course always R/W and the ROM uses different ops and doesn't support > > > > mmap, so this is a device specific region of some sort. Thanks, > > > > > > > It's a virtual mdev device for which we want to emulate a virtual > > > read-only MMIO BAR. > > > Is there any consideration that PCI BARs have to be R/W ? > > > we didn't find it out in PCI specification. > > What the device chooses to do with writes to a BAR is its own business, > the PCI spec shouldn't try to define that. There's also no PCI spec > mechanism to declare the access protections for an entire BAR, that's > device specific behavior. The current QEMU vfio-pci behavior is > therefore somewhat implicit in knowing this for a directly assigned > device. We can mmap the device and we expect writes to unwritable > registers within that mapping to be dropped. > > For an mdev device, we can't rely on the user honoring the access > protections, ie. the user shouldn't be able to exploit the kernel or > device by doing so, but I also agree that QEMU, as a friendly vfio > user, should avoid unsupported operations and protect itself from how > the kernel may handle the fault. > > Since this mdev device doesn't exist yet, I'm thinking this is QEMU > v5.1 material though. > > > looks MMIO regions in vfio platform are also possible to be read-only and > > mmaped. > > Yes. Thanks, > > Alex >