So given all this I am voting -1 and calling this vote as a failure. I am attempting to test Alan's new patch and hopefully I will roll a 4.2.1-rc1 later this week.
Thanks On Wed, Apr 16, 2014 at 9:18 AM, Alan M. Carroll < a...@network-geographics.com> wrote: > I was asked for a translation of my previous email, bonging the 4.2.1 RC0. > > The problem in 4.2.0 was a shift in the set of WKS values. These are not > just live data but also written to the cache in the object headers so if > they change at all, it de facto invalidates the cache. The 4.2.0 crashes > (TS-2564) are due to this, because various secondary bits of data get > written inconsistently which in turns causes ATS to look up the wrong data > for header fields. For instance, the VARY field would be written out along > with a hint about where it was in the header. When read back in 4.2.0 ATS > would use the stored WKS index to lookup the hint location and get the > wrong location (because VARY had shifted) and use that to find the wrong > data for VARY (possibly null or unallocated memory). > > To fix this, 4.2.1 simply clears all the hints and rewrites them when the > object is read from disk if using a cache version earlier than 4.2.1. This > ignores the stored values and uses only the current in memory values. > > However, it turns out that when the object is read from disk, it may be > stored in the ram cache. If retrieved from ram cache later, it goes through > the same logic as if it had been loaded from disk, which includes clearing > and rewriting the hints. The ATS logic, though, doesn't lock the object for > this because it is expected to be read only once read from the disk. The > TS-2564 logic violates this and thereby creates a race condition between > two transaction both access the same object. It is possible for one to > check the valid hints for a field and then, while it is trying to retrieve > the field, the other transaction can clear the hints causing the field to > not be found. This leads to a crash because the logic assumes (reasonably) > that if it's checked the hints and verified the field presence, the field > is present and will be found. If the field is not found, you get a null > pointer dereference. > > The solution is to prevent the 4.2.0 fixup from being done on objects > retrieved from the ram cache. There's no need as the fixup was done when it > was read from disk and put in the ram cache. There is no race condition for > disk reads because those are not shared until after the fixup. > >