With MoarMV at c506bb31535cccca and NQP at 9466ff0d7ac54291 (current HEADs) with this change in src/gc/collect.h
-#define MVM_NURSERY_SIZE 2097152 +#define MVM_NURSERY_SIZE (1024 * 1646) MoarVM deadlocks in t/serialization/01-basic.t If I run helgrind (which I'd never tried before, so this is new to me): valgrind --tool=helgrind --num-callers=42 /home/nicholas/Sandpit/moar-g/bin/moar nqp.moarvm t/serialization/01-basic.t I see ok 51 - array b nested array second element ok ok 52 - array b third element is correct ==11811== ---Thread-Announcement------------------------------------------ ==11811== ==11811== Thread #1 is the program's root thread ==11811== ==11811== ---------------------------------------------------------------- ==11811== ==11811== Thread #1: Attempt to re-lock a non-recursive lock I already hold ==11811== at 0x4C2BFC1: pthread_mutex_lock (hg_intercepts.c:484) ==11811== by 0x4FACB40: uv_mutex_lock (thread.c:69) ==11811== by 0x4F62F87: gc_free (SCRef.c:83) ==11811== by 0x4F42B48: MVM_gc_collect_free_nursery_uncopied (collect.c:515) ==11811== by 0x4F3AB38: run_gc (orchestrate.c:291) ==11811== by 0x4F3AE85: MVM_gc_enter_from_allocator (orchestrate.c:367) ==11811== by 0x4F3B081: MVM_gc_allocate_nursery (allocation.c:32) ==11811== by 0x4F3B106: MVM_gc_allocate_zeroed (allocation.c:49) ==11811== by 0x4F3B349: MVM_gc_allocate_object (allocation.c:85) ==11811== by 0x4F4BA0B: allocate (MVMArray.c:32) ==11811== by 0x4F62A4C: initialize (SCRef.c:45) ==11811== by 0x4F48652: MVM_repr_init (reprconv.c:9) ==11811== by 0x4F6F79E: MVM_sc_create (in /home/nicholas/Sandpit/moar-g/lib/libmoar.so) ==11811== by 0x4F0A258: MVM_interp_run (interp.c:3346) ==11811== by 0x4F8EC64: MVM_vm_run_file (moar.c:173) ==11811== by 0x400C9D: main (main.c:137) ==11811== Lock was previously acquired ==11811== at 0x4C2C069: pthread_mutex_lock (hg_intercepts.c:495) ==11811== by 0x4FACB40: uv_mutex_lock (thread.c:69) ==11811== by 0x4F6C40D: MVM_sc_create (in /home/nicholas/Sandpit/moar-g/lib/libmoar.so) ==11811== by 0x4F0A258: MVM_interp_run (interp.c:3346) ==11811== by 0x4F8EC64: MVM_vm_run_file (moar.c:173) ==11811== by 0x400C9D: main (main.c:137) ==11811== and the program hangs. The problem is that MVM_sc_create() locks &tc->instance->mutex_sc_weakhash MVMObject * MVM_sc_create(MVMThreadContext *tc, MVMString *handle) { MVMSerializationContext *sc; MVMSerializationContextBody *scb; /* Allocate. */ MVMROOT(tc, handle, { sc = (MVMSerializationContext *)REPR(tc->instance->SCRef)->allocate(tc, STABLE(tc->instance->SCRef)); MVMROOT(tc, sc, { /* Add to weak lookup hash. */ uv_mutex_lock(&tc->instance->mutex_sc_weakhash); MVM_string_flatten(tc, handle); MVM_HASH_GET(tc, tc->instance->sc_weakhash, handle, scb); if (!scb) { sc->body = scb = calloc(1, sizeof(MVMSerializationContextBody)); MVM_ASSIGN_REF(tc, (MVMObject *)sc, scb->handle, handle); MVM_HASH_BIND(tc, tc->instance->sc_weakhash, handle, scb); MVM_repr_init(tc, (MVMObject *)sc); scb->sc = sc; } ... uv_mutex_unlock(&tc->instance->mutex_sc_weakhash); }); and that call to MVM_repr_init() is triggering a GC run, which in turn is finding something to free, which then wants to lock the same mutex: /* Called by the VM in order to free memory associated with this object. */ static void gc_free(MVMThreadContext *tc, MVMObject *obj) { MVMSerializationContext *sc = (MVMSerializationContext *)obj; /* Remove from weakref lookup hash (which doesn't count as a root). */ uv_mutex_lock(&tc->instance->mutex_sc_weakhash); HASH_DELETE(hash_handle, tc->instance->sc_weakhash, sc->body); uv_mutex_unlock(&tc->instance->mutex_sc_weakhash); /* Free manually managed STable list memory and body. */ MVM_checked_free_null(sc->body->root_stables); MVM_checked_free_null(sc->body); } Deadlock. (Or, apparently an error on FreeBSD. This was Linux) I don't know what the right fix is. *) Is it safe for more than one place in the same thread to manipulate tc->instance->sc_weakhash? If so, should the mutex be changed to some sort of (faked up?) recursive mutex? *) Or should MVM_sc_create be moving the mutex lock much more tightly around the manipulations of tc->instance->sc_weakhash? (But can MVM_HASH_BIND trigger a GC run? Can MVM_string_flatten?) nor do I know if similar systematic flaws exist in other mutex handling - can the GC attempt to lock any of the other mutexes? Nicholas Clark