Moin Moin! I've been haunting down a nasty backtrace problem in drkonqi where it entirely fails to create a backtrace and am now fairly confident this is in fact a design flaw with kcrash, but I have no awesome ideas on how to solve this properly.
Long story short: there is a space of time between SEGV occurring and drkonqi stopping the threads. This causes (e.g.) GIO threads to actively unavoidably crash the process. Most recently this could/can be observed with plasmashell which has a GIO thread sitting around when (I think) flatpak updates are being checked. The result is that the crash cannot be traced because the process dies before drkonqi has a chance to deal with it. If you have ever seen a warning or error of the kind "XCB connection lost" or something similar it is in fact the very same problem, albeit usually not fatal. When a process crashes SEGV is sent to any one thread. The other threads continue to run! When the SEGV arrives the standard handler will possibly restart the process, then close all open file descriptors, potentially start (and wait for) drkonqi and when drkonqi has worked its magic raise itself to a core pattern process if applicable [1]. The threads have still not been suspended! When drkonqi starts, it sends STOP to the crashed process. STOP is delivered to every thread, thus stopping everything this time around. Only now is the process "safe" from crashing while crashing. And that's the race right there. In between the file descriptors getting closed and the STOPping the threads that aren't being handled and continue to run to potentially access the now-closed file descriptors. In GIO's case it can try to read inotify events and run into an error (e.g. in ik_source_read_some_events) and g_error, which as far as I can tell will result in a TRAP because g_error almost always(?) ends in g_abort. The solution is simply: we shouldn't close FDs before all threads are stopped. Practically I can't think of a way to actually pull this off though. We'd need to close the FDs *at* STOP. But STOP like KILL cannot be handled. I think the actual solution here would need to be that kcrash stops invoking drkonqi and instead defers to a core handler through which drkonqi can get access to the core. Trouble is that there can only be one core handler and there are more software providers on a system than just us, so I guess this isn't really a viable solution :/ Also the core stuff isn't too portable I think. I am fairly out of ideas :/ [1] http://man7.org/linux/man-pages/man5/core.5.html