With MoarMV at c506bb31535cccca and NQP at 9466ff0d7ac54291 (current HEADs)
with this change in src/gc/collect.h

-#define MVM_NURSERY_SIZE 2097152
+#define MVM_NURSERY_SIZE (1024 * 1646)

MoarVM deadlocks in t/serialization/01-basic.t

If I run helgrind (which I'd never tried before, so this is new to me):

valgrind --tool=helgrind --num-callers=42 
/home/nicholas/Sandpit/moar-g/bin/moar nqp.moarvm  t/serialization/01-basic.t

I see

ok 51 - array b nested array second element ok
ok 52 - array b third element is correct
==11811== ---Thread-Announcement------------------------------------------
==11811==
==11811== Thread #1 is the program's root thread
==11811==
==11811== ----------------------------------------------------------------
==11811==
==11811== Thread #1: Attempt to re-lock a non-recursive lock I already hold
==11811==    at 0x4C2BFC1: pthread_mutex_lock (hg_intercepts.c:484)
==11811==    by 0x4FACB40: uv_mutex_lock (thread.c:69)
==11811==    by 0x4F62F87: gc_free (SCRef.c:83)
==11811==    by 0x4F42B48: MVM_gc_collect_free_nursery_uncopied (collect.c:515)
==11811==    by 0x4F3AB38: run_gc (orchestrate.c:291)
==11811==    by 0x4F3AE85: MVM_gc_enter_from_allocator (orchestrate.c:367)
==11811==    by 0x4F3B081: MVM_gc_allocate_nursery (allocation.c:32)
==11811==    by 0x4F3B106: MVM_gc_allocate_zeroed (allocation.c:49)
==11811==    by 0x4F3B349: MVM_gc_allocate_object (allocation.c:85)
==11811==    by 0x4F4BA0B: allocate (MVMArray.c:32)
==11811==    by 0x4F62A4C: initialize (SCRef.c:45)
==11811==    by 0x4F48652: MVM_repr_init (reprconv.c:9)
==11811==    by 0x4F6F79E: MVM_sc_create (in 
/home/nicholas/Sandpit/moar-g/lib/libmoar.so)
==11811==    by 0x4F0A258: MVM_interp_run (interp.c:3346)
==11811==    by 0x4F8EC64: MVM_vm_run_file (moar.c:173)
==11811==    by 0x400C9D: main (main.c:137)
==11811==  Lock was previously acquired
==11811==    at 0x4C2C069: pthread_mutex_lock (hg_intercepts.c:495)
==11811==    by 0x4FACB40: uv_mutex_lock (thread.c:69)
==11811==    by 0x4F6C40D: MVM_sc_create (in 
/home/nicholas/Sandpit/moar-g/lib/libmoar.so)
==11811==    by 0x4F0A258: MVM_interp_run (interp.c:3346)
==11811==    by 0x4F8EC64: MVM_vm_run_file (moar.c:173)
==11811==    by 0x400C9D: main (main.c:137)
==11811==

and the program hangs.

The problem is that MVM_sc_create() locks &tc->instance->mutex_sc_weakhash


MVMObject * MVM_sc_create(MVMThreadContext *tc, MVMString *handle) {
    MVMSerializationContext     *sc;
    MVMSerializationContextBody *scb;

    /* Allocate. */
    MVMROOT(tc, handle, {
        sc = (MVMSerializationContext *)REPR(tc->instance->SCRef)->allocate(tc, 
STABLE(tc->instance->SCRef));
        MVMROOT(tc, sc, {
            /* Add to weak lookup hash. */
            uv_mutex_lock(&tc->instance->mutex_sc_weakhash);
            MVM_string_flatten(tc, handle);
            MVM_HASH_GET(tc, tc->instance->sc_weakhash, handle, scb);
            if (!scb) {
                sc->body = scb = calloc(1, sizeof(MVMSerializationContextBody));
                MVM_ASSIGN_REF(tc, (MVMObject *)sc, scb->handle, handle);
                MVM_HASH_BIND(tc, tc->instance->sc_weakhash, handle, scb);
                MVM_repr_init(tc, (MVMObject *)sc);
                scb->sc = sc;
            }

            ...

            uv_mutex_unlock(&tc->instance->mutex_sc_weakhash);
        });


and that call to MVM_repr_init() is triggering a GC run, which in turn is
finding something to free, which then wants to lock the same mutex:

/* Called by the VM in order to free memory associated with this object. */
static void gc_free(MVMThreadContext *tc, MVMObject *obj) {
    MVMSerializationContext *sc = (MVMSerializationContext *)obj;

    /* Remove from weakref lookup hash (which doesn't count as a root). */
    uv_mutex_lock(&tc->instance->mutex_sc_weakhash);
    HASH_DELETE(hash_handle, tc->instance->sc_weakhash, sc->body);
    uv_mutex_unlock(&tc->instance->mutex_sc_weakhash);

    /* Free manually managed STable list memory and body. */
    MVM_checked_free_null(sc->body->root_stables);
    MVM_checked_free_null(sc->body);
}


Deadlock. (Or, apparently an error on FreeBSD. This was Linux)


I don't know what the right fix is.

*) Is it safe for more than one place in the same thread to manipulate
   tc->instance->sc_weakhash? If so, should the mutex be changed to some sort
   of (faked up?) recursive mutex?
*) Or should MVM_sc_create be moving the mutex lock much more tightly around
   the manipulations of tc->instance->sc_weakhash?
   (But can MVM_HASH_BIND trigger a GC run?
    Can MVM_string_flatten?)

nor do I know if similar systematic flaws exist in other mutex handling -
can the GC attempt to lock any of the other mutexes?

Nicholas Clark

Reply via email to