On Jan-10, Leopold Toetsch wrote: > > You get double the amount of PMCs into the cache - used during marking > and freeing. It isn't related to alignment, just more throughput.
Oh. You're right. I was thinking that the unused portion of the PMC wouldn't need to be loaded into the cache, so that only the "active" portions of the PMCs would ever be loaded. Which is a fine argument, if your objects are larger than a cache line. But probably few CPUs we care about have only 32-byte cache lines. Ah! So all we have to do is use discontiguous PMCs -- the first 32 bytes is at offset 0, the second at byte offset 128 or so. Then we can interleave them, so that everything in offset 0..127 gets loaded into the cache, but 128..255 is left untouched. (Just kidding.)