https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246207
Bug ID: 246207 Summary: [geom] geli livelocks during panic Product: Base System Version: 12.1-STABLE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: b...@freebsd.org Reporter: asom...@freebsd.org Some geli-using machines I administer occasionally panic. When they do, they sometimes dump core but often don't. When they don't, they simply hang after printing the stack trace, but before printing the uptime. I've traced the problem to geli's shutdown_pre_sync event handler. It tries to destroy each geli device. We can't simply skip that step if a panic is underway; erasing the keys is necessary to prevent warm-boot attacks. The problem lies in the following lines. g_eli_destroy: sc->sc_flags |= G_ELI_FLAG_DESTROY; wakeup(sc); /* * Wait for kernel threads self destruction. */ while (!LIST_EMPTY(&sc->sc_workers)) { msleep(&sc->sc_workers, &sc->sc_queue_mtx, PRIBIO, "geli:destroy", 0); } _sleep: if (SCHEDULER_STOPPED_TD(td)) { if (lock != NULL && priority & PDROP) class->lc_unlock(lock); return (0); } As you can see, if the scheduler is stopped for the current thread (which it will be during a panic), then msleep does nothing, cause g_eli_destroy to loop indefinitely. The obvious solution, which I haven't yet tested, would be to skip that section in g_eli_destroy when the scheduler is stopped. What I don't understand is why g_eli_destroy _ever_ works during a panic. Perhaps it has something to do with the allocation of worker threads among cores? Perhaps it only succeeds when all worker threads happen to be on different cores? I find that unlikely though, because these servers have thousands of worker threads. -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"