On Sat, Apr 20, 2013 at 09:27:16AM +0200, Mathias Fröhlich wrote: > > Hi Tom, > > May be I need to tell where the problem really appears in real life. > OpenSceneGraph has some nifty features regarding multi channel rendering. > Assume a setup of multiple full screen views running on different graphics > boards into a single mashine composing a view into a single scene. > Now the recommended way to do this with osg is to set up a X screen per > graphics board. Even if this spans multiple monitors/projectors. Set up a GL > graphics context per graphics board and set up a viewport per projector in > the > graphics contexts. Rendering happens now in parallel for each graphics > context. I do drive such a thing here with two radeons and three monitors for > testing and here the problem appears. > > When I start the favourite flight simulation application of my choice with > this > setup, then it crashes almost immediately without llvm_start_multithreaded > being called. Wheres it works stable if we ensure llvm being multithreaded. > > So, I tried to distill a piglit testcase out of this somehow huger setup with > flightgear, OpenSceneGraph, multiple gpu's and what not. > > On Friday, April 19, 2013 20:08:54 Tom Stellard wrote: > > On Wed, Apr 17, 2013 at 07:54:32AM +0200, Mathias Fröhlich wrote: > > > Tom, > > > > > > > -class LLVMEnsureMultithreaded { > > > > -public: > > > > - LLVMEnsureMultithreaded() > > > > - { > > > > - llvm_start_multithreaded(); > > > > - } > > > > -}; > > > > - > > > > -static LLVMEnsureMultithreaded lLVMEnsureMultithreaded; > > > > > > Removing this leads to crashes in llvm with applications that concurrently > > > work on different gl contexts. > > > > The test you wrote still passes with this patch. Do you see that > > we are now calling the C API version of llvm_start_multithreaded(), > > LLVMStartMutithreaded() from inside radeon_llvm_compile() protected by a > > static variable? > > Oh, no I did not see this. I did not realize that the > llvm_start_multithreaded > call is not just plain C. So I thought grepping for the call I used is > sufficient. > > But negative. If I really apply your patch and try to run this with the above > setup I get the crashes. The same with the piglit test here. > > Too bad, that reproducing races is racy for itself. > With the piglit test I get about 2/3 of the runs either glibc memory > corruption aborts. Or one of the below asserts from llvm: > > bool llvm::llvm_start_multithreaded(): Assertion `!multithreaded_mode && > "Already multithreaded!"' failed. > > void > llvm::PassRegistry::removeRegistrationListener(llvm::PassRegistrationListener*): > > Assertion `I != Impl->Listeners.end() && "PassRegistrationListener not > registered!"' failed. > > bool llvm::sys::SmartMutex<mt_only>::release() [with bool mt_only = true]: > Assertion `((recursive && acquired) || (acquired == 1)) && "Lock not acquired > before release!"' failed. > > So the biggest problem IIRC was that use of llvm::sys::SmartMutex<mt_only> > which is spread around here and there in llvm. The pass registry was (is?) > one > of the users for that. If you did not tell llvm to run multithreaded these > locks get noops and you concurrently access containers and that ... > > Looking at the first assert, the llvm guys have made this problem even worse > IMO since I looked at this before. We need to check for multithreading being > enabled before trying to set this. Both of which being racy for itself in > this > way and all of them not being save against already happening llvm access from > an other thread and an other foreign use. > > > Sorry about that. I didn't have piglit commit access at the time, and > > I forgot about the patch. I fixed a few things and sent v3 to the list. > The same here. Thanks for this. > > > > Regarding the point where this funciton is called I had choosen static > > > initialization time since llvm requires this function to be called single > > > threaded which we cannot guarantee in any case. Keep in mind that you need > > > to ensure this function called non concurrently even against applications > > > that itself already use the llvm libs in some way while the driver is > > > loaded. But the best bet is to do this in the dynamic loder which is > > > itself serialized, so I could avoid calling this function concurrently by > > > initialization of different contexts. That should at least shield against > > > applications that itself do the same trick by calling this funtion in the > > > dlopen phase in some static initializer ... > > > We may get around part of this problem with dlopening the driver with > > > better isolation but up to now the problem can get that far. > > > > This is a tricky problem, and I'm not sure that radeon_llvm_compile() is > > the best place to call llvm_start_multithreaded(). Maybe it would be > > better to move this into gallivm, because this problem could affect any > > driver that uses the gallivm code, which includes: llvmpipe, r300g, r600g, > > radeonsi, and i915g. What do you think? > > Yep, an other place would be better. > > I do not know the llvm tools well enough, but If I move the current c++ code > into src/gallium/auxiliary/gallivm/lp_bld_misc.cpp it works for me (TM). > Seriously, I know of one guy who wants to use llvmpipe with windows and he > would benefit from the c++ solution since the gcc extension solution is > clearly > not available with the cl windows build then. And with the lack of a windows > test system I do also not start implementing DllMain. > That could be a short term solution ... > >
Hi Mathias, First of all, thanks for investigating this. The information you've provided has helped me a lot. > Appart from that, I mentioned better symbol isolation somehow. > > How would this improove the situation with the initialization? > If we would know that this current driver module is the only user of the this > particular llvm instance, then we could really place a mutex somewhere and > serialize the call to lvm::llvm_start_multithreaded() by ourselves as we then > know that we are the only user. This would then happen before the first use > of > llvm by the driver module in question. No dlopen time static initializers > that > usually lead to ordering problems earlier or later ... > > So, I have a patch set floating around here for about half a year that > changes > the RTLD_GLOBAL arguments to dlopen to RTLD_LOCAL|RTLD_DEEPBIND which should > (? I still have to definitely investigate this) provide us the private copy > of > the used libraries/modules. Also up to now I did not take the time to really > analyze if we rely on some globals being really shared between driver modules > or different types of drivers that we then also have multiple times in > memory. > Note that this does *not* affect the shared glapi topics as they are all > above > the loaded driver modules at the level of libGL.so.1. > So for example I can think of some need for a single libdrm_radeon to make cl- > gl buffer sharing work for example? Do we? > > Alternatively, you might be able to pack every use of llvm at least in radeon > statically into libllvmradeon and again hide all non driver api used stuff > from > the outside. Then you would also have your private llvm instance that you can > initialize with your own lock to serialize. That is then more a radeon alone > solution as the other drivers need to do something similar themselves then. > And I realize that your recent code move work could point into this direction > at least for the radeon type drivers. > I took a shot at implementing it this way with private static copies of llvm. I've pushed the initial work to this branch: git://people.freedesktop.org/~tstellar/mesa llvm-build I was able to hide the LLVM symbols by using the --exclude-libs linker flag and so far I've tested with glxgears, an opencl example and also eglgears and they all work. I still need to do some more testing, but any thoughts on what I've done so far in this branch? -Tom > Both symbol isolation methods will also make the driver used llvm version > independent of any possibly application used version that can lead to > incompatible symbol clashes. So you gain even more with that. > > Opinions? > > Mathias _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev