On Wed, Jul 1, 2015 at 9:26 AM, Rui Wang <rui.y.w...@intel.com> wrote: > On Tuesday, June 30, 2015 11:24 PM, Daniel Vetter <daniel.vet...@ffwll.ch> > wrote: >> On Tue, Jun 30, 2015 at 9:23 AM, Rui Wang <rui.y.w...@intel.com> wrote: >> > But einj does something more than what an IPI can do, it injects hardware >> > errors which trigger exceptions in NMI context... and the exception handler >> > usually panics on fatal errors. And the display may be the only way to >> > catch >> > what has happened. I'm just hoping that the future version may work in >> > NMI context. >> >> NMI sounds ... ambigous ;-) But yeah if we can somehow inject >> something as an NMI too then that would be even better. What I want to >> avoid is forcing reboots, since that means you can't run a basic >> modeset test afterwards to make sure nothing was trampled too badly. >> Of course we'd have replace the screen contents, but the important >> part is that the panic handler doen't touch anything if the driver is >> in modeset code right now (because it'll massively increase the risk >> of dying completely), and an easy way to check that it didn't step all >> over modeset state unduly is to do a modeset afterwards. If that works >> we'll be fine. >> >> Also with that approach we can make sure that no real errors get into >> dmesg (as opposed to a real panic), which means we can capture dmesg >> afterwards and if there is a seroius log message (or even backtrace) >> then drm panic handling has a bug. >> >> All that isn't possible when we force a real panic to happen. >> >> Actually thinking more about NMI that shouldn't be a problem. The >> important thing with nmi vs. hardirq is that you can't even reliably >> grab an irqsave spinlock, it's trylocks all the way down. But that >> also holds for the panic handler, it's trylocks only. Could we somehow >> just check that using lockdep - is there an NMI lockdep context >> somewhere we could fake-grab? That's another upside of using an IPI >> btw: Real panics kill lockdep ;-) > > Einj is supported by ACPI in combination with the hardwre. The injected > errors result in true MCEs, truly non-maskable. Lockdep might not be useful > in this case. Corrected Errors (CEs) don't result in panic but I guess it > might be possible to let it invoke your future mode-setting code for testing > purpose, without rebooting. (Notice that MCEs can happen right from inside > your mode-setting code while accessing any memory address)
Yeah NMI can happen anywhere and that's about the worst-case panic context we have. The problem is that NMI bugs are a giant pain to debug, so for testing I think it'd be better to just have a hardirq context + the help of lockdep (if possible) to make sure we only do try_lock and lockless stuff. > But anyway we're not looking for a 100% working solution so if it could only > work in normal irq or ipi context, it'd already be a big plus compared to > what we have now. NMI vs ipi vs other stuff is just about what's the best debug/testing strategy. Most of the work there will really be in writing tons of testcases to race the drm panic handler against drm modeset ioctls. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/