Mark Kettenis <mark.kette...@xs4all.nl> wrote: > > 0(ffffd85fe50a7e0,2,35d87c,4505,ffff80000025c000,fffffd85fe50a7e0) at 0 > > scsi_done(fffffd85fe50a7e0) at scsi_done+0x31 > > nvme_q_complete(ffff800000255000, ffff800002c79a80) at nume_q_complete+0x134 > > nume_intr(ffff800000255000) at nume_intr+0x2b > > intr_handler(ffff800049e24990, ffff800000254200) at intr_handler+0x91 > > Xintr_ioapic_edge28_untramp() at Xintr_ioapic_edge28_untramp+0x18f > > acpicpu_idle() at acpicpu_idle+0x131 > > sched_idle(ffffffff82770ff0) at sched_idle+0x298 > > > > end trace frame: 0x0, count: 8 > > I think this is a bug in nvme(4). For some reason it gets a > (spurious?) interrupt while in the suspended state with stuff torn > down and dereferences a stale pointer. We probably need to do a > better job quiescing the thing when we suspend.
No kidding. dv, did you get anywhere with your various diffs? Greg, can you try out the various diffs he sent? It's a mishmash of solutions not yet entirely decided. The nvme driver doesn't seem to have any soft state variable which will indicate that it is "down". Comparing against ahci, it also has no such variable, but inspection of an ahci port will never show work to do. It is curious that nvme_q_complete() will find anything to do inside a ring. There is no way a scsi transaction should be sitting on a queue. The bufq layer has ensured there is no transaction. I think the ring contains garbage for some reason. dlg / jmatthew, any thoughts?