Hi Yavor,

On 2011-03-11, at 12:25 PM, Yavor Doganov wrote:

> Hi Eric,
> 
> On Mon, Feb 14, 2011 at 03:05:49PM -0700, Eric Wasylishen wrote:
>> It's caused by the thread-local fast_path_cache variable in pixman.c. 
>> If you make that non-thread-local (a normal static variable) the 
>> problem will go away.
> 
> Yep, or if you set the tls_model to *-exec.  But IMO this shouldn't be 
> required: "global-dynamic" appears to be the right TLS model for shared 
> libraries.  IMVHO, if something was seriously broken with pixman's (new) 
> TLS support, the whole world would be crashing, not only GNUstep.

Right. As far as I remember, I looked at the disassembled code for this 
variable in Ubuntu's pixman package, and it was using the "global-dynamic" 
model, which is the correct model to use in shared libraries. So it doesn't 
look like pixman is doing anything wrong.

>> The root problem here is interaction between thread local storage and 
>> dlopen, because the gnustep-back bundle, which dynamically links to 
>> libpixman, is dlopened by gnustep-gui.
> 
> Could you please explain more about this interaction (CCing 
> 613...@bugs.debian.org if possible)?  According to pixman's upstream 
> maintainer, and my humble reading about the TLS documentation in GCC, 
> there should be no problem at all.
> 
I think what led me to say this was, I found that modifying GNUstep-gui so it 
links directly to cairo and pixman made the crash disappear. So it was more or 
less just a guess that dlopen was somehow involved.

However, after doing a bit more research I agree there should be no problem 
with dlopen and TLS, assuming the shared library uses the correct TLS model, 
which pixman does. 

Further supporting this, I tried to write a simple test case with a layout 
similar to GNUstep:

1. executable, dynamically linked to:
2. shared library, which dlopens:
3. shared library, which uses TLS

and I was unable to get a crash to happen. 

> Can you reproduce if you configure gnustep-back with --disable-glx?  I 
> can't, which leads me to the clue that the real culprit is mesa, which 
> uses __attribute__ ((tls_model ("initial-exec"))) for the thread-local 
> variables in libGL.so, and that's apparently incompatible.

Hm, that's interesting! It sounds like a convincing hypothesis.

I can test that, but it will take me a few days because I have to set up this 
virtual machine again.

BTW, I switched from 32-bit Ubuntu 10.10 (where I was observing the bug) to 
amd64 Ubuntu 10.10, and found that this bug doesn't occur on amd64. Are you 
also observing it on 32-bit only?

>> However, I'm not sure how to properly fix it other than building 
>> pixman without TLS.
> 
> Well, we have to find where the bug really lies and fix it there.  I'm 
> afraid building pixman without TLS support is the wrong course of action 
> from wherever you look at it; I doubt that pixman's maintainers would be 
> keen on such move (and rightfully so).

I agree, it seems like the fact that disabling TLS in pixman makes the bug 
symptoms disappear is more or less a coincidence.

Hopefully we are close to tracking down this really strange bug :-)

Cheers,
Eric

Reply via email to