Launchpad has imported 7 comments from the remote bug at http://sourceware.org/bugzilla/show_bug.cgi?id=3429.
If you reply to an imported comment from within Launchpad, your comment will be sent to the remote bug automatically. Read more about Launchpad's inter-bugtracker facilities at https://help.launchpad.net/InterBugTracking. ------------------------------------------------------------------------ On 2006-10-27T17:31:27+00:00 Suzuki-in wrote: While running some stress tests on one of our application, we encountered an assert() in ld.so as follows: "Inconsistency detected by ld.so: dl-open.c: 610: _dl_open: Assertion `_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT' failed! with glibc-2.4.31. This race seems to be present in the libc I got from the CVS [at code inspection]. We were able to reproduce this consistently within 4-5hrs of run. Upon debugging we found that it is due to a race between two threads doing a _dl_open(). The scenario is something like this : In elf/dl-open.c, _dl_open: /* Make sure we are alone. */ __rtld_lock_lock_recursive (GL(dl_load_lock)); [...] int errcode = _dl_catch_error (&objname, &errstring, &malloced, dl_open_worker, &args); #ifndef MAP_COPY /* We must munmap() the cache file. */ _dl_unload_cache (); #endif /* Release the lock. */ __rtld_lock_unlock_recursive (GL(dl_load_lock)); ^^^^^ This would kick any other thread waiting on the lock. if (__builtin_expect (errstring != NULL, 0)) { [...] assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT); } assert (_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT); And, if the thread which gets woken up is playing with the same namespace, and sets the r_state to RT_ADD in _dl_map_object_from_fd even before we reach here (truly possible in an SMP system), ( due to getting scheduled out ), we would hit the assert ! So, it is not safe to believe that the r_state won't get changed once we release the lock. Reply at: https://bugs.launchpad.net/glibc/+bug/72639/comments/0 ------------------------------------------------------------------------ On 2006-10-27T17:39:49+00:00 Suzuki-in wrote: Created attachment 1391 patch to fix the race This patch has been tested to fix the issue. Comments ? Thanks Reply at: https://bugs.launchpad.net/glibc/+bug/72639/comments/1 ------------------------------------------------------------------------ On 2006-10-27T18:43:52+00:00 Drepper-fsp wrote: You're addressing a real problem. The assert are unimportant by the _dl_close call must be protected. This is fixed now. Reply at: https://bugs.launchpad.net/glibc/+bug/72639/comments/2 ------------------------------------------------------------------------ On 2006-10-27T18:50:50+00:00 Suzuki-in wrote: (In reply to comment #2) > You're addressing a real problem. The assert are unimportant by the _dl_close > call must be protected. This is fixed now. So could you please let us know if there is already a patch existing for the issue ? Or can we use this patch as the final fix ? Thanks. Reply at: https://bugs.launchpad.net/glibc/+bug/72639/comments/3 ------------------------------------------------------------------------ On 2007-01-12T15:21:48+00:00 Cvs-commit wrote: Subject: Bug 3429 CVSROOT: /cvs/glibc Module name: libc Branch: glibc-2_5-branch Changes by: ja...@sourceware.org 2007-01-12 15:21:33 Modified files: . : ChangeLog elf : Makefile dl-close.c dl-open.c Added files: elf : tst-thrlock.c Log message: * elf/dl-close.c (_dl_close_worker): Renamed from _dl_close and split out locking and parameter checking. (_dl_close): Call _dl_close_worker after locking and checking. * elf/dl-open.c (_dl_open): Call _dl_close_worker instead of _dl_close. * elf/Makefile: Add rules to build and run tst-thrlock. * elf/tst-thrlock.c: New file. [BZ #3429] * elf/dl-open.c (dl_open_worker): Keep holding dl_load_lock until we are sure we do not need it anymore for _dl_close. Also move the asserts inside the lock region. Patch mostly by Suzuki <suz...@in.ibm.com>. Patches: http://sourceware.org/cgi-bin/cvsweb.cgi/libc/ChangeLog.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=1.10362.2.7&r2=1.10362.2.8 http://sourceware.org/cgi-bin/cvsweb.cgi/libc/elf/tst-thrlock.c.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=NONE&r2=1.2.4.1 http://sourceware.org/cgi-bin/cvsweb.cgi/libc/elf/Makefile.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=1.315&r2=1.315.2.1 http://sourceware.org/cgi-bin/cvsweb.cgi/libc/elf/dl-close.c.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=1.117&r2=1.117.2.1 http://sourceware.org/cgi-bin/cvsweb.cgi/libc/elf/dl-open.c.diff?cvsroot=glibc&only_with_tag=glibc-2_5-branch&r1=1.128&r2=1.128.2.1 Reply at: https://bugs.launchpad.net/glibc/+bug/72639/comments/13 ------------------------------------------------------------------------ On 2009-07-24T01:48:21+00:00 Radford wrote: I noticed this same message with glibc-2.10.1-2.x86_64. It happened after a suspend when my disk was churning, so I suspect there's another race. Reply at: https://bugs.launchpad.net/glibc/+bug/72639/comments/17 ------------------------------------------------------------------------ On 2009-07-24T01:52:04+00:00 Drepper-fsp wrote: Stop reopening bugs. If you have something to report open a new bug. But not if you're not providing real information like a reproducer. Reply at: https://bugs.launchpad.net/glibc/+bug/72639/comments/18 ** Changed in: glibc Importance: Unknown => Medium -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/72639 Title: Intermittently fails to load on amd64 -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs