Nicholas Clark <n...@ccl4.org> wrote ..
> I believe that a write barrier is missing in MVM_sc_get_sc(). Patch attached.
> 
Agree with analysis and fix. Applied, thanks!

> This was really really messy to figure out. The bug manifests as a SEGV in
> process_worklist because a pointer to an object is 0x6. Clearly not a pointer.
> The value 0x6 has come from the scs array of a MVMCompUnit. In turn, the
> value 0x6 gets copied to that array because the GC thinks that it's the value
> of a forwarding pointer in an object in fromspace, during a GC sweep.
> 
> In turn, that address in memory is 6 because it's actually now the count for
> an array - ie ...->scs[4] is a stale pointer to memory that has been freed
> and re-used, and this is only noticed after a pair (or any even number) of
> GC runs.
> 
Urrrgh. Yes, sometimes you luck out and the problem is obvious from the 
location of the crash. And some cases, it happens way downstream. I wonder if 
there are some places we can have optional "clearly not a pointer" sanity 
checks that can be turned on by conditional compilation. I mean, we know all 
possible memory address ranges a GC-able object may be at, or at least ranges 
they should fall within...

> 1) Off topic question - is there a point at runtime at which derealisation
>    contexts are no longer needed, and so can be discarded to reduce memory
>    usage?
> 
No; the objects we deserialize are deserialized in order to be able to 
use/access them at runtime, and the context keeps track of them in order that 
we can reference them if we compile code that refers to them. And that can 
happen at any time, such as in an eval. Serialization contexts are GC-able in 
and of themselves, though, so any SC that comes to exist thanks to eval'ing 
some code can be collected when no longer referenced.

> <snip> 
> 
> 2) Do most missing write barrier bugs look as crazy to figure out as this one?
>    Or was this one particularly special by corrupting something outside of
>    the nursery by way of a "corrupt" forwarding pointer?
> 
In general, I'd say missing write barrier ones are worse to figure out than 
missing MVMROOT-ing, since objects have to have made it into the second 
generation in order for the issues to manifest. And than can be a long time. 
Thankfully, given the rules are simple (never use = to assign into a GC-able 
object, always MVM_ASSIGN_REF), there are likely few of these. I suspect this 
one *may* have come up because way back in MoarVM history, compilation units 
were not GC-able objects, and so there was nothing to write barrier, and then 
it didn't get put in at the needed point. That's just speculation without going 
and looking through history, mind.

Thanks,

Jonathan

Reply via email to