Nicholas Clark <n...@ccl4.org> wrote .. > I believe that a write barrier is missing in MVM_sc_get_sc(). Patch attached. > Agree with analysis and fix. Applied, thanks!
> This was really really messy to figure out. The bug manifests as a SEGV in > process_worklist because a pointer to an object is 0x6. Clearly not a pointer. > The value 0x6 has come from the scs array of a MVMCompUnit. In turn, the > value 0x6 gets copied to that array because the GC thinks that it's the value > of a forwarding pointer in an object in fromspace, during a GC sweep. > > In turn, that address in memory is 6 because it's actually now the count for > an array - ie ...->scs[4] is a stale pointer to memory that has been freed > and re-used, and this is only noticed after a pair (or any even number) of > GC runs. > Urrrgh. Yes, sometimes you luck out and the problem is obvious from the location of the crash. And some cases, it happens way downstream. I wonder if there are some places we can have optional "clearly not a pointer" sanity checks that can be turned on by conditional compilation. I mean, we know all possible memory address ranges a GC-able object may be at, or at least ranges they should fall within... > 1) Off topic question - is there a point at runtime at which derealisation > contexts are no longer needed, and so can be discarded to reduce memory > usage? > No; the objects we deserialize are deserialized in order to be able to use/access them at runtime, and the context keeps track of them in order that we can reference them if we compile code that refers to them. And that can happen at any time, such as in an eval. Serialization contexts are GC-able in and of themselves, though, so any SC that comes to exist thanks to eval'ing some code can be collected when no longer referenced. > <snip> > > 2) Do most missing write barrier bugs look as crazy to figure out as this one? > Or was this one particularly special by corrupting something outside of > the nursery by way of a "corrupt" forwarding pointer? > In general, I'd say missing write barrier ones are worse to figure out than missing MVMROOT-ing, since objects have to have made it into the second generation in order for the issues to manifest. And than can be a long time. Thankfully, given the rules are simple (never use = to assign into a GC-able object, always MVM_ASSIGN_REF), there are likely few of these. I suspect this one *may* have come up because way back in MoarVM history, compilation units were not GC-able objects, and so there was nothing to write barrier, and then it didn't get put in at the needed point. That's just speculation without going and looking through history, mind. Thanks, Jonathan