Launchpad #49221 is a high priority bug caused by the 13_smoother_fading
patch. I will describe the cascade of issues that triggers it. Next, I
will enumerate some user workarounds. Finally, I will explain why I'm
not fixing the issue:

It is well known gnome-session is in need of a refactoring due to
feature creep. Among other things, it handles initial client startup,
the splash screen, assistive technologies, DBUS, keyring, startup
sounds, the logout dialog, proxying of GDM actions
(suspend/hiberate/logout/etc.) to other software
(gnome-power-manager)... and session management. That list does not
include the deprecated features it still has code supporting. On top
this, we (downstream Ubuntu) apply 12 patches of varying levels of
intrusiveness.

Under this weight, multiple things have given.

The symptom for bug #49221 is, upon return from a software suspend, a
user's desktop appears partially locked up to 30 minutes. (maxint
microseconds = 35 minutes) The issue is triggered by
13_smoother_fading's flawed implementation. 13_smoother_fading is
intended to improve the quality of the faded background that appears
with the logout dialog. Specifically, the original fading code relies
upon GTK timeout callbacks. These are often irregular and therefore the
resulting fade was irregular. The new fading code adds a usleep call
designed to regulate the speed.

Session management occurs across UNIX pipes between clients to the
session manager. Managed clients connect to a pipe and coordinate with
session manager their state. The common client implementation blocks
while waiting for a response from the session manager. If a session
manager locks up, most managed clients will not appear to have executed.
In reality, they will be in blocking before they have been realized.

With all that in mind, the bug occurs when a user triggers the logout
dialog. While the fade is occurring, the user clicks the suspend button.
Multiple bugs paths can occur here. Suffice it to say, the timeout is
improperly canceled and the next time the usleep is called, a time-skew
issue occurs so the timeout is a number near MAXINT. (depending on the
duration of suspend) Since a usleep is used, no other code is executed
in g-s-m. And therefore, all managed applications block.

This issue does not occur in the upstream code. Their algorithm is
broken in other ways that can cause session management issues in extreme
cases. However, the fade code explicitly checks for a time-skew and
aborts if one is detected. Regardless, they use timeout callbacks. So,
even in a worst case situation, session management is still being
handled.

In summary, gnome-session sucks. 13_smoother_fading sucks more. And it's
all a pain to properly fix. (The same code is used in libgksu. But no
one hits that race condition in practice.)

What can our user's do? Not suspend via the logout dialog.
gnome-power-manager's battery icon can be clicked for suspend/hibernate
actions. Also, closing a laptop lid will (usually) work. Share and
enjoy.

Why am I not fixing this bug? Because even fixing this particular issue
leaves fistfuls of other serious issues in the same flawed code-path.
Our downstream g-s-m is fairly mangled, and the version upstream is not
much better. I'd rather engage in a refactoring. But, I am a busy
student and not the upstream maintainer.

Writing the following is easier than trying to figure out how to get a
patch proposed in main. (At least Universe has clear rules...)

Add a skew check before the usleep. Or, have a maximum cutoff.

The end.

-- 
Scott Robinson <[EMAIL PROTECTED]>
http://quadhome.com/

Attachment: signature.asc
Description: Digital signature

-- 
Ubuntu-devel-discuss mailing list
Ubuntu-devel-discuss@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-devel-discuss

Reply via email to