On Thu, Sep 13, 2018 at 10:00:43AM -0600, Alex Williamson wrote: > On Thu, 13 Sep 2018 12:04:34 +0200 > Paolo Bonzini <pbonz...@redhat.com> wrote: > > > On 13/09/2018 11:11, Paolo Bonzini wrote: > > > On 13/09/2018 08:03, Fam Zheng wrote: > > >> On Wed, 09/12 14:42, Paolo Bonzini wrote: > > >>> On 12/09/2018 13:50, Fam Zheng wrote: > > >>>>> I think it's okay if it is invoked. The sequence is first you stop > > >>>>> the > > >>>>> vq, then you drain the BlockBackends, then you switch AioContext. All > > >>>>> that matters is the outcome when virtio_scsi_dataplane_stop returns. > > >>>> Yes, but together with vIOMMU, it also effectively leads to a > > >>>> virtio_error(), > > >>>> which is not clean. QEMU stderr when this call happens (with patch 1 > > >>>> but not > > >>>> this patch): > > >>>> > > >>>> 2018-09-12T11:48:10.193023Z qemu-system-x86_64: vtd_iommu_translate: > > >>>> detected translation failure (dev=02:00:00, iova=0x0) > > >>>> 2018-09-12T11:48:10.193044Z qemu-system-x86_64: New fault is not > > >>>> recorded due to compression of faults > > >>>> 2018-09-12T11:48:10.193061Z qemu-system-x86_64: virtio: zero sized > > >>>> buffers are not allowed > > >>> > > >>> But with iothread, virtio_scsi_dataplane_stop runs in another thread > > >>> than the iothread; in that case you still have a race where the iothread > > >>> can process the vq before aio_disable_external and print the error. > > >>> > > >>> IIUC the guest has cleared the IOMMU page tables _before_ clearing the > > >>> DRIVER_OK bit in the status field. Could this be a guest bug? > > >> > > >> I'm not sure if it is a bug or not. I think what happens is the device > > >> is left > > >> enabled by Seabios, and then reset by kernel. > > > > > > That makes sense, though I'm not sure why QEMU needs to process a > > > request long after SeaBIOS has left control to Linux. Maybe it's just > > > that the messages should not go on QEMU stderr, and rather trace-point > > > should be enough. > > > > Aha, it's not that QEMU needs to poll, it's just that polling mode is > > enabled, and it decides to do one last iteration. In general the virtio > > spec allows the hardware to poll whenever it wants, hence: > > > > 1) I'm not sure that translation failures should mark the device as > > broken---definitely not when doing polling, possibly not even in > > response to the guest "kicking" the virtqueue. Alex, does the PCI spec > > say anything about this? > > AFAIK the PCI spec doesn't define anything about the IOMMU or response > to translation failures. Depending on whether it's a read or write, > the device might see an unsupported request or not even be aware of the > error. It's really a platform RAS question whether to have any more > significant response, most don't, but at least one tends to consider > IOMMU faults to be a data integrity issue worth bring the system down. > We've struggled with handling ongoing DMA generating IOMMU faults > during kexec for a long time, so any sort of marking a device broken > for a fault should be thoroughly considered, especially when a device > could be assigned to a user who can trivially trigger a fault. > > > 2) translation faliures should definitely not print messages to stderr. > > Yep, easy DoS vector for a malicious guest, or malicious userspace > driver within the guest. Thanks,
Note that it's using error_report_once() upstream so it'll only print once for the whole lifecycle of QEMU process, and it's still a tracepoint downstream so no error will be dumped by default. So AFAIU it's not a DoS target for either. I would consider it a good hint for strange bugs since AFAIU DMA error should never exist on well-behaved guests. However I'll be fine too to post a patch to make it an explicit tracepoint again if any of us would still like it to go away. Thanks, -- Peter Xu