No problem, I totally understand the need to be confident in this area. We're taking about fixing a rare bug in a critical area. The risk of collateral damage is high. Let me try and distill the test case down a bit.
Chris On Wed, Feb 18, 2026, 6:04 PM Paul King <[email protected]> wrote: > I think we would be keen to fix a definite bug. Not having a > reproducer makes it a little harder for us to convince ourselves but > if you include everything you can we should be able to piece together > enough of an understanding. Maybe just include a skeleton of your "not > quite minimal" test and some sample debug output in the PR or issue > description. This is an area that we try hard to get right and then > not touch - so maybe why we are taking our time to understand it. > > Paul. > > On Wed, Feb 18, 2026 at 11:53 PM Chris Dennis <[email protected]> > wrote: > > > > I think at this point I have enough understanding of what's going on > > to file an issue, and create a PR with a fix. I also have a diagnostic > > test, (albeit not minimal enough to be committed) which currently only > > fails on J9. It's still unclear however whether I have enough buy-in > > from the community on the existence of the issue to get any PR I > > create merged. What would be the best next step here? > > > > Chris > > > > On Tue, 10 Feb 2026 at 12:08, Chris Dennis <[email protected]> > wrote: > > > > > > On Tue, 10 Feb 2026 at 06:27, Jochen Theodorou <[email protected]> > wrote: > > > > > > > > > > > > > > > > On 2/9/26 22:33, Chris Dennis wrote: > > > > > On Mon, 9 Feb 2026 at 15:04, Jochen Theodorou <[email protected]> > wrote: > > > > >> > > > > >> On 02.02.26 17:50, Chris Dennis wrote: > > > > [...] > > > > > The lazy mechanism is working correctly - the problem is that the > > > > > lazily created instances are only softly-referenced which means > the GC > > > > > can come along later and clean that reference, and then a newly > > > > > arriving thread will create an additional instance of CachedClass > for > > > > > the same type. If/when those two instances then meet they will > falsely > > > > > compare not-equal and the Groovy runtime will think they represent > > > > > different types (which they don't). > > > > > > > > Maybe what happens is that the class is ready to be collected, but > then > > > > we use ClassInfo to get the instance, while at the same time we are > > > > creating a new instance? Just trying to figure out why two instances > > > > even exist. > > > > > > The current mechanism I can see that allows you to end up with two > > > instances is because a ClassInfo softly references its CachedClass via > > > `ClassInfo.cachedClassRef` but CachedClass instances also softly > > > reference the CachedClass instances corresponding to the parameter > > > types of their methods via the `CachedClass.methods` field. When the > > > ClassInfo.cachedClassRef reference is cleared by the GC we are primed > > > for the construction of a new instance, but the old instance can still > > > be accessible via the `CachedClass.methods` field of any other type > > > with a method that takes that type as a parameter. In a more abstract > > > sense we can't rely on the current scheme to prevent multiple > > > instances from existing while we allow the instances to be > > > softly-reachable via paths involving more than one instance of > > > soft-reference, because those instances will not all be cleared at the > > > same time. > > > > > > > > > > > > > > > > I think the fix here is to > > > > > implement (and use) an equals method for CachedClass which uses > this > > > > > referential comparison but only as an optimistic fast path. > > > > > > > > that is a workaround for me though... well... it depends on the > > > > conditions we want those constructs to fullfill > > > > > > To me this isn't a workaround, but an acceptance that we cannot > > > reliably maintain the 1:1 relationship between ClassInfo and > > > CachedClass - and that therefore the equality contract between them > > > cannot be a referential one. > > > > > > > > > > > > (There is > > > > > another theoretical fix here where all accesses of a given > CachedClass > > > > > are always mediated through a single SoftReference, which I think > > > > > would make the existing scheme safe, but I fear it would be overly > > > > > brittle). > > > > > > > > This sounds a lot like the soft reference will reference an instance, > > > > that only this soft reference will reference. Which would be bad > > > > > > It's not bad... since the soft reference will not be cleared while the > > > referent is strongly referenced. So the CachedClass instance could not > > > be replaced while there was a strong reference to it that its future > > > equivalent could be compared with. I cannot think of an easy way of > > > preventing someone from breaking such a system though, (even if only > > > accidentally), so I think it's not worth the risk. > > > > > > > > > > > >>> I'm attempting to narrow down how exactly this is happening and > whether > > > > >>> the cause is OpenJ9 incorrectly clearing a soft reference to a > strongly > > > > >>> reachable instance, or is due to Groovy missing one or more > > > > >>> reachabilityFences to prevent early clearing of these references. > > > > >> > > > > >> Can you verify other JVMs as well? > > > > > > > > > > I've not been able to reproduce this on anything other than OpenJ9 > - > > > > > which I suspect is due to OpenJ9 being much more eager to clear > > > > > references. I'm pretty sure I have a valid mechanism through which > it > > > > > can happen though - it just seems to be impossible to make Hotspot > > > > > trigger it (so far). > > > > > > > > Haven't worked with the eclipse/IBM JVm for many years (since Java 9 > or > > > > so), but I do remember having regularily trouble with the references > > > > stuff on there. > > > > > > > > bye Jochen > > > > > > > > > > Chris >
