Bug#800574: Bug#807244: libegl1-nvidia: Programs crash due to elisian-unlock on skylake processor with nvidia driver 352.63-1 (experimental)

Andreas Beckmann Tue, 08 Dec 2015 10:28:17 -0800

Hi Aurelien,

thanks for your analysis.

On 2015-12-08 10:23, Aurelien Jarno wrote:
> I disagree it is supposed to be fixed. Intel got a few bugs in there
> TSX-NI implementation for Haswell and Broadwell and possibly early
> versions of Skylake, and to avoid data loss we have therefore disabled
> lock elision for some CPU revisions.

That's what I meant with "fixed". But obviously there are two problems
here: buggy hardware (blacklisted, #800574) and ...

> That said the bugs in the Intel
> implementation are corner cases, and it took quite some time for them to
> get discovered. If your program crashes reproducibly, it's definitely not
> an issue with the TSX-NI implementation. Disabling --enable-lock-elision
> it's just a workaround for the real issue. People now start to have CPUs
> with a working TSX-NI implementation which is therefore not blacklisted
> and thus the problem is appearing again.

... buggy software (#807244), which is only exposed by running on
hardware with working TSX-NI.
That could also explain the fact that the bug was introduced in 352+.

Jelle, I didn't dig through the nvidia forums, but if this info isn't
mentioned there already, maybe you could post it:

> According to the backtrace the problem is typical of a call to
> mutex_unlock() on a mutex which hasn't been locked with mutex_lock()
> before.
(or was already unlocked.)

Andreas

Bug#800574: Bug#807244: libegl1-nvidia: Programs crash due to elisian-unlock on skylake processor with nvidia driver 352.63-1 (experimental)

Reply via email to