On Thu, Aug 08, 2024 at 04:15:25PM +0300, Kirill A. Shutemov wrote: > On Thu, Aug 08, 2024 at 08:10:34AM -0400, Michael S. Tsirkin wrote: > > On Thu, Aug 08, 2024 at 10:51:41AM +0300, Kirill A. Shutemov wrote: > > > Hongyu reported a hang on kexec in a VM. QEMU reported invalid memory > > > accesses during the hang. > > > > > > Invalid read at addr 0x102877002, size 2, region '(null)', reason: > > > rejected > > > Invalid write at addr 0x102877A44, size 2, region '(null)', reason: > > > rejected > > > ... > > > > > > It was traced down to virtio-console. Kexec works fine if virtio-console > > > is not in use. > > > > virtio is not doing a lot of 16 bit reads. > > Are these the reads: > > > > virtio_cread(vdev, struct virtio_console_config, cols, > > &cols); > > virtio_cread(vdev, struct virtio_console_config, rows, > > &rows); > > > > ? > > > > write is a bit puzzling too. This one? > > > > bool vp_notify(struct virtqueue *vq) > > { > > /* we write the queue's selector into the notification register to > > * signal the other end */ > > iowrite16(vq->index, (void __iomem *)vq->priv); > > return true; > > } > > Given that we are talking about console issue, any suggestion on how to > check?
If you do lspci -v on the device, we'll know where the BARs are, and can compare to 0x102877002, 0x102877A44. > > > > > > Looks like virtio-console continues to write to the MMIO even after > > > underlying virtio-pci device is removed. > > > > You mention both MMIO and pci, I am confused. > > By MMIO, I mean accesses to PCI BARs. But it is only my *guess* on the > situation, I have limited knowledge of the area. I am not drivers guy. > > > Removed by what? In what sense? > > So device_shutdown() iterates over all device and we hit the problem when > we get to virtio-pci devices and call pci_device_shutdown on them. Hmm that clears bus master. So maybe what we see is actually device trying to do DMA and failing? We'll need to know where do these addresses are on your system. > I *think* PCI BAR (or something else?) becomes unavailable after that but > it is still accessed. > > > > > > > The problem can be mitigated by removing all virtio devices on virtio > > > bus shutdown. > > > > > > Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> > > > Reported-by: Hongyu Ning <hongyu.n...@linux.intel.com> > > > > A bit worried about doing so much activity on shutdown, > > and for all devices, too. I'd like to understand what > > is going on a bit better - could be a symptom of > > a bigger problem (e.g. missing handling for suprise > > removal?). > > I probably should have marked the patch as RFC. The patch was intended to > start conversation. I am not sure it is correct. This patch just happened > to work in our setup. > > -- > Kiryl Shutsemau / Kirill A. Shutemov