Re: KCrash crash racing

Aleix Pol Mon, 05 Aug 2019 07:47:31 -0700

On Wed, Jul 31, 2019 at 12:26 PM Harald Sitter <sit...@kde.org> wrote:
>
> Moin Moin!
>
> I've been haunting down a nasty backtrace problem in drkonqi where it
> entirely fails to create a backtrace and am now fairly confident this
> is in fact a design flaw with kcrash, but I have no awesome ideas on
> how to solve this properly.
>
> Long story short: there is a space of time between SEGV occurring and
> drkonqi stopping the threads. This causes (e.g.) GIO threads to
> actively unavoidably crash the process. Most recently this could/can
> be observed with plasmashell which has a GIO thread sitting around
> when (I think) flatpak updates are being checked. The result is that
> the crash cannot be traced because the process dies before drkonqi has
> a chance to deal with it.
>
> If you have ever seen a warning or error of the kind "XCB connection
> lost" or something similar it is in fact the very same problem, albeit
> usually not fatal.
>
> When a process crashes SEGV is sent to any one thread. The other
> threads continue to run!
> When the SEGV arrives the standard handler will possibly restart the
> process, then close all open file descriptors, potentially start (and
> wait for) drkonqi and when drkonqi has worked its magic raise itself
> to a core pattern process if applicable [1].
> The threads have still not been suspended!
> When drkonqi starts, it sends STOP to the crashed process. STOP is
> delivered to every thread, thus stopping everything this time around.
> Only now is the process "safe" from crashing while crashing.
>
> And that's the race right there. In between the file descriptors
> getting closed and the STOPping the threads that aren't being handled
> and continue to run to potentially access the now-closed file
> descriptors. In GIO's case it can try to read inotify events and run
> into an error (e.g. in ik_source_read_some_events) and g_error, which
> as far as I can tell will result in a TRAP because g_error almost
> always(?) ends in g_abort.
>
> The solution is simply: we shouldn't close FDs before all threads are stopped.
>
> Practically I can't think of a way to actually pull this off though.
> We'd need to close the FDs *at* STOP. But STOP like KILL cannot be
> handled.
>
> I think the actual solution here would need to be that kcrash stops
> invoking drkonqi and instead defers to a core handler through which
> drkonqi can get access to the core.
> Trouble is that there can only be one core handler and there are more
> software providers on a system than just us, so I guess this isn't
> really a viable solution :/
> Also the core stuff isn't too portable I think.


Well, yes. It's a complex issue as we're dealing with a dying process.

My impression is that relying on core handlers makes a lot of sense,
there would be some questions to answer such as "what happens when
running on other systems". Maybe for now we could try doing an
in-between? Handling cores on plasma and using drkonqi as we do now
otherwise?

Does drkonqi work nowadays at all on systems that aren't Linux/BSD?

Aleix

Re: KCrash crash racing

Reply via email to