Hi Yavor, On 2011-03-11, at 12:25 PM, Yavor Doganov wrote:
> Hi Eric, > > On Mon, Feb 14, 2011 at 03:05:49PM -0700, Eric Wasylishen wrote: >> It's caused by the thread-local fast_path_cache variable in pixman.c. >> If you make that non-thread-local (a normal static variable) the >> problem will go away. > > Yep, or if you set the tls_model to *-exec. But IMO this shouldn't be > required: "global-dynamic" appears to be the right TLS model for shared > libraries. IMVHO, if something was seriously broken with pixman's (new) > TLS support, the whole world would be crashing, not only GNUstep. Right. As far as I remember, I looked at the disassembled code for this variable in Ubuntu's pixman package, and it was using the "global-dynamic" model, which is the correct model to use in shared libraries. So it doesn't look like pixman is doing anything wrong. >> The root problem here is interaction between thread local storage and >> dlopen, because the gnustep-back bundle, which dynamically links to >> libpixman, is dlopened by gnustep-gui. > > Could you please explain more about this interaction (CCing > 613...@bugs.debian.org if possible)? According to pixman's upstream > maintainer, and my humble reading about the TLS documentation in GCC, > there should be no problem at all. > I think what led me to say this was, I found that modifying GNUstep-gui so it links directly to cairo and pixman made the crash disappear. So it was more or less just a guess that dlopen was somehow involved. However, after doing a bit more research I agree there should be no problem with dlopen and TLS, assuming the shared library uses the correct TLS model, which pixman does. Further supporting this, I tried to write a simple test case with a layout similar to GNUstep: 1. executable, dynamically linked to: 2. shared library, which dlopens: 3. shared library, which uses TLS and I was unable to get a crash to happen. > Can you reproduce if you configure gnustep-back with --disable-glx? I > can't, which leads me to the clue that the real culprit is mesa, which > uses __attribute__ ((tls_model ("initial-exec"))) for the thread-local > variables in libGL.so, and that's apparently incompatible. Hm, that's interesting! It sounds like a convincing hypothesis. I can test that, but it will take me a few days because I have to set up this virtual machine again. BTW, I switched from 32-bit Ubuntu 10.10 (where I was observing the bug) to amd64 Ubuntu 10.10, and found that this bug doesn't occur on amd64. Are you also observing it on 32-bit only? >> However, I'm not sure how to properly fix it other than building >> pixman without TLS. > > Well, we have to find where the bug really lies and fix it there. I'm > afraid building pixman without TLS support is the wrong course of action > from wherever you look at it; I doubt that pixman's maintainers would be > keen on such move (and rightfully so). I agree, it seems like the fact that disabling TLS in pixman makes the bug symptoms disappear is more or less a coincidence. Hopefully we are close to tracking down this really strange bug :-) Cheers, Eric