On Wed, 26 Mar 2025 16:43:21 GMT, Stefan Karlsson <stef...@openjdk.org> wrote:
> > > > I think release/uncommit failures should be handled by the callers. > > > > Currently, uncommit failure is handled in most places by the caller, > > > > release failure seems mostly not. Since, at least for uncommit, we > > > > could sometimes fail for valid reasons, I think we shouldn't fail > > > > fatally in the os:: functions. > > > > > > > > > I would like to drill a bit deeper into this. Do you have any concrete > > > examples of an uncommit failure that should not be handled as a fatal > > > error? > > > > > > I second @stefank here. > > Uncommit can fail, ironically, with an ENOMEM : if the uncommit punches a > > hole into a committed region, this would cause a new new VMA on the > > kernel-side. This may fail if we run against the limit for VMAs. Forgot > > what it was, some sysconf setting. All of this is Linux specific, though. > > This happens when we hit the /proc/sys/vm/max_map_count limit, and this > immediately crashes the JVM. Yes, but maybe it shouldn't (see below). > > > I don't think this should be unconditionally a fatal error. Since the > > allocator (whatever it is) can decide to re-commit the region later, and > > thus "self-heal" itself. > > Is this referring to failures when we hit the max_map_count limit? I'm not > convinced that you can recover from that without immediately hitting the same > issue somewhere else in the code. Well, you could scrape around for a while and maybe not trigger it. E.g. in Metaspace, I uncommit granules, but that is optional. I could just ignore uncommit errors there. In the heap, we could do the same thing. After a while, the memory may get reused and thus recommitted, thereby solving the problem. I admit this problem is a bit theoretical, and it may be acceptable to (continue to) crash at that point, since other allocations - libc, heap etc - will face the same limit. Running against this limit seems rare in my experiences; we mostly saw it with ZGC in the past. > > Or maybe you are thinking about some of the other reasons for the uncommit to > fail? Honestly, I don't know why else uncommit would fail. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2757093258