On Sat, Dec 21, 2013 at 08:45:49PM +0000, Nicholas Clark wrote:

> ==11811== Thread #1: Attempt to re-lock a non-recursive lock I already hold
> ==11811==    at 0x4C2BFC1: pthread_mutex_lock (hg_intercepts.c:484)
> ==11811==    by 0x4FACB40: uv_mutex_lock (thread.c:69)
> ==11811==    by 0x4F62F87: gc_free (SCRef.c:83)
> ==11811==    by 0x4F42B48: MVM_gc_collect_free_nursery_uncopied 
> (collect.c:515)
> ==11811==    by 0x4F3AB38: run_gc (orchestrate.c:291)
> ==11811==    by 0x4F3AE85: MVM_gc_enter_from_allocator (orchestrate.c:367)
> ==11811==    by 0x4F3B081: MVM_gc_allocate_nursery (allocation.c:32)
> ==11811==    by 0x4F3B106: MVM_gc_allocate_zeroed (allocation.c:49)
> ==11811==    by 0x4F3B349: MVM_gc_allocate_object (allocation.c:85)
> ==11811==    by 0x4F4BA0B: allocate (MVMArray.c:32)
> ==11811==    by 0x4F62A4C: initialize (SCRef.c:45)
> ==11811==    by 0x4F48652: MVM_repr_init (reprconv.c:9)
> ==11811==    by 0x4F6F79E: MVM_sc_create (in 
> /home/nicholas/Sandpit/moar-g/lib/libmoar.so)
> ==11811==    by 0x4F0A258: MVM_interp_run (interp.c:3346)
> ==11811==    by 0x4F8EC64: MVM_vm_run_file (moar.c:173)
> ==11811==    by 0x400C9D: main (main.c:137)
> ==11811==  Lock was previously acquired
> ==11811==    at 0x4C2C069: pthread_mutex_lock (hg_intercepts.c:495)
> ==11811==    by 0x4FACB40: uv_mutex_lock (thread.c:69)
> ==11811==    by 0x4F6C40D: MVM_sc_create (in 
> /home/nicholas/Sandpit/moar-g/lib/libmoar.so)
> ==11811==    by 0x4F0A258: MVM_interp_run (interp.c:3346)
> ==11811==    by 0x4F8EC64: MVM_vm_run_file (moar.c:173)
> ==11811==    by 0x400C9D: main (main.c:137)
> ==11811==

> I don't know what the right fix is.
> 
> *) Is it safe for more than one place in the same thread to manipulate
>    tc->instance->sc_weakhash? If so, should the mutex be changed to some sort
>    of (faked up?) recursive mutex?
> *) Or should MVM_sc_create be moving the mutex lock much more tightly around
>    the manipulations of tc->instance->sc_weakhash?
>    (But can MVM_HASH_BIND trigger a GC run?
>     Can MVM_string_flatten?)
> 
> nor do I know if similar systematic flaws exist in other mutex handling -
> can the GC attempt to lock any of the other mutexes?

I've been running with the appended for a few weeks now, and I think that
it's the right fix.

Nicholas Clark
>From d7756b2c99cb6791261e048cd1d4c66503ba5322 Mon Sep 17 00:00:00 2001
From: Nicholas Clark <n...@ccl4.org>
Date: Sun, 22 Dec 2013 05:04:23 +0100
Subject: [PATCH] Release tc->instance->mutex_sc_weakhash as soon as possible
 to avoid deadlock.

If the GC frees serialisation contexts, as part of the cleanup it will
attempt to lock tc->instance->mutex_sc_weakhash. If the GC happens inside
a section where that mutex is already locked, the process will deadlock.
So reduce the time the mutex is held to the bare minimum, so that nothing
that allocates GC-ables is within the critical section.
---
 src/6model/sc.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/src/6model/sc.c b/src/6model/sc.c
index 613f4c0..91e7814 100644
--- a/src/6model/sc.c
+++ b/src/6model/sc.c
@@ -13,26 +13,33 @@ MVMObject * MVM_sc_create(MVMThreadContext *tc, MVMString *handle) {
         MVMROOT(tc, sc, {
             /* Add to weak lookup hash. */
             uv_mutex_lock(&tc->instance->mutex_sc_weakhash);
+            /* Important to avoid doing anything that might trigger a GC run
+               while we hold this mutex. */
             MVM_string_flatten(tc, handle);
             MVM_HASH_GET(tc, tc->instance->sc_weakhash, handle, scb);
             if (!scb) {
                 sc->body = scb = calloc(1, sizeof(MVMSerializationContextBody));
                 MVM_ASSIGN_REF(tc, (MVMObject *)sc, scb->handle, handle);
                 MVM_HASH_BIND(tc, tc->instance->sc_weakhash, handle, scb);
+                uv_mutex_unlock(&tc->instance->mutex_sc_weakhash);
+                /* This may trigger a GC run, and if we're really unlucky, that
+                   GC run may wish to lock tc->instance->mutex_sc_weakhash. */
                 MVM_repr_init(tc, (MVMObject *)sc);
                 scb->sc = sc;
             }
-            else if (scb->sc) {
-                /* we lost a race to create it! */
-                sc = scb->sc;
-            }
             else {
-                scb->sc = sc;
-                sc->body = scb;
-                MVM_ASSIGN_REF(tc, sc, scb->handle, handle);
-                MVM_repr_init(tc, (MVMObject *)sc);
+                uv_mutex_unlock(&tc->instance->mutex_sc_weakhash);
+                if (scb->sc) {
+                    /* we lost a race to create it! */
+                    sc = scb->sc;
+                }
+                else {
+                    scb->sc = sc;
+                    sc->body = scb;
+                    MVM_ASSIGN_REF(tc, sc, scb->handle, handle);
+                    MVM_repr_init(tc, (MVMObject *)sc);
+                }
             }
-            uv_mutex_unlock(&tc->instance->mutex_sc_weakhash);
         });
     });
 
-- 
1.8.4.2

Reply via email to