The proposed change here is the following: 1. Move the `vmdependencies` list head from the `Context` to the `CallSite` object 2. Remove the Context object and its corresponding cleaner
(1.) fixes the crash. (2.) is because without `vmdependencies` the Context and its cleaner serves no purpose. First what goes wrong without this patch. Currently when the GC unlinks a nmethod it clean/flush out all the dependency contexts. These are either attached to an InstanceKlass, or a Context object reachable through a CallSite oop. A nmethod is unlinked when either one of its oops have died or it is heuristically determined to be cold and should be unloaded. So when the GC clean's the context through a CallSite oop, it may be that that CallSite object is dead. For ZGC, which is a concurrent generational GC (the different generations are collected concurrently, but in a coordinated manner), it is important that the unlinking is coordinated with the reclamation of this dead object. In generational ZGC all nmethod oops are considered as strong roots if they reside in the young generation and thusly can only become unreachable / dead after promotion to the old generation. This means that the CallSite object at the time of unlinking is either reachable / live, or unreachable / dead and is reclaimed by the old generation collection (the same generation that does the unlinking). So we can make reading from this object safe by not reclaiming the object before unlinking is finished. The issue is that we do not have this guarantee for the Context object. As this is a distinct object it may be that it has not been promoted and resides in the young generation at the time of its CallSite object becoming unreachable and collected by the old generation collection. If this is the case and a young generation collection runs after old marking has finished, we have two bad scenarios. If it the young generation collection starts after reference processing and the cleaner has run, the Context object would be unreachable and the young generation collection would reclaim the memory. If it started before the reference processing it would still be reachable, but may be relocated. For reachable old CallSite objects the Context oop field would have been tracked and remapped if a young collection relocates the Context object, however this is not true then the CallSite is unreachable. The Context object may have moved or been reclaimed, and the load barrier on the field will produce a bogus oop because the field will not have been tracked (and remapped) correctly. This will cause the crash. The solution in this patch is to remove the Context object and store the `vmdependencies` list head directly in the CallSite object. This avoids any cross generation pointers. An alternative would have been to keep the cleaner and change the GC code to only look at CallSite object which are reachable. However this would require a per-GC adaption and would be a larger change. (Even if it is only really ZGC which would need to do anything, maybe that generational Shenandoah is also affected, but not something I have looked into) I've tried to dig and find why a cleaner was used in the first place as it seems like the GC has cleaned these / removed unloading nmethods for a while. Maybe someone else have more insight. The lifetime of the native memory is now solely tied to the lifetime of the nmthod it tracks a dependency for. This means that it is important that after `add_dependent_nmethod` it will eventually be seen by and can be unlinked by the GC. As far as I can tell this seem to be the case, `add_dependent_nmethod` is called after a nmethod has been created in the code cache. Tested: * Github Actions * Oracle supported platforms tier1 through 8 ------------- Commit messages: - 8347564: ZGC: Crash in DependencyContext::clean_unloading_dependents Changes: https://git.openjdk.org/jdk/pull/23194/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23194&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347564 Stats: 168 lines in 13 files changed: 8 ins; 143 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/23194.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23194/head:pull/23194 PR: https://git.openjdk.org/jdk/pull/23194