On Mon, December 14, 2015 16:07:38 Martin Graesslin wrote: > On Friday, November 27, 2015 1:05:26 PM CET Michael Pyne wrote: > > On Thu, November 26, 2015 13:16:04 Martin Graesslin wrote: > > > we are facing a problem during the startup of Plasma on Wayland. If OOM > > > protection is enabled for kdeinit and we already have a running X > > > server, > > > kdeinit freezes dead. > > > > > > I'm sorry for having ignored the issue for too long and had just > > > disabled > > > OOM protection on my system, so I never hit it. Now I enabled it again > > > to > > > get the problem. On my system I have now two frozen kdeinit processes: > > > > > > martin 1960 1956 0 77832 26448 1 13:05 ? 00:00:00 > > > /opt/kf5/bin/ kdeinit5 --oom-pipe 4 --kded +kcminit_startup > > > martin 1961 1960 0 77832 2816 3 13:05 ? 00:00:00 > > > /opt/kf5/bin/ kdeinit5 --oom-pipe 4 --kded +kcminit_startup > > > > > > One has the following stacktrace: > > > It's frozen in this line of code: > > > sigsuspend(&oldsigs); // wait for the signal to come > > > > > > The other one has the following stacktrace: > > > which is: > > > d.n = read(d.fd[0], &d.result, 1); > > > > > > Given that it looks to me like these two processes dead-lock. I do not > > > understand why, why it only happens on Wayland, why the fact that an X > > > server must already be running is relevant and what the OOM protection > > > has > > > to do with it. > > > > I don't have the answer but I can help explain the deadlock better I > > think. > > thanks for your input. It helped me understanding quite a bit. > > Some more testing results: > Weston+Xwayland: doesn't show the problem > Weston without Xwayland (and DISPLAY=$WAYLAND_DISPLAY): doesn't show the > problem. > > What I absolutely do not understand how KWin could influence it. From all > the backtraces I see it always freezes before interacting with the > windowing system. > > Any more ideas to test and investigate, highly appreciated. I got a rather > high number of complaints due to that problem and it's a showstopper and I'm > lost with it.
Did you add an error check around the set_protection call in start_kdeinit.c and see if that call is failing? (i.e. does "kill(pid, SIGUSR1)" ever execute?). If the kill() call *is* reached then perhaps SIGUSR1 is unintentionally masked in the 'grandchild' process (the child of kdeinit about to be exec()'d). Perhaps something in the wayland/kwin/weston/x11 library interaction blocks SIGUSR1 from being received in that case? I think the easiest possible fix is to replace the sigsuspend call with a sigtimedwait() call, constructed to wait for SIGUSR1 alone, but with a short timeout. In the event the timeout is reached, continue with the exec() as normal, possibly after leaving a noisy warning. It's probably a good idea to do this anyway since library code shouldn't wait indefinitely just because OOM is enabled, but you're the one best positioned to reproduce at this point :) Regards, - Michael Pyne
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Kde-frameworks-devel mailing list Kde-frameworks-devel@kde.org https://mail.kde.org/mailman/listinfo/kde-frameworks-devel