On Wed, 26 Mar 2025 16:43:21 GMT, Stefan Karlsson <stef...@openjdk.org> wrote:

> > > > I think release/uncommit failures should be handled by the callers. 
> > > > Currently, uncommit failure is handled in most places by the caller, 
> > > > release failure seems mostly not. Since, at least for uncommit, we 
> > > > could sometimes fail for valid reasons, I think we shouldn't fail 
> > > > fatally in the os:: functions.
> > > 
> > > 
> > > I would like to drill a bit deeper into this. Do you have any concrete 
> > > examples of an uncommit failure that should not be handled as a fatal 
> > > error?
> > 
> > 
> > I second @stefank here.
> > Uncommit can fail, ironically, with an ENOMEM : if the uncommit punches a 
> > hole into a committed region, this would cause a new new VMA on the 
> > kernel-side. This may fail if we run against the limit for VMAs. Forgot 
> > what it was, some sysconf setting. All of this is Linux specific, though.
> 
> This happens when we hit the /proc/sys/vm/max_map_count limit, and this 
> immediately crashes the JVM.

Yes, but maybe it shouldn't (see below).

> 
> > I don't think this should be unconditionally a fatal error. Since the 
> > allocator (whatever it is) can decide to re-commit the region later, and 
> > thus "self-heal" itself.
> 
> Is this referring to failures when we hit the max_map_count limit? I'm not 
> convinced that you can recover from that without immediately hitting the same 
> issue somewhere else in the code.

Well, you could scrape around for a while and maybe not trigger it. E.g. in 
Metaspace, I uncommit granules, but that is optional. I could just ignore 
uncommit errors there. In the heap, we could do the same thing. 

After a while, the memory may get reused and thus recommitted, thereby solving 
the problem.

I admit this problem is a bit theoretical, and it may be acceptable to 
(continue to) crash at that point, since other allocations - libc, heap etc - 
will face the same limit. Running against this limit seems rare in my 
experiences; we mostly saw it with ZGC in the past.

> 
> Or maybe you are thinking about some of the other reasons for the uncommit to 
> fail?

Honestly, I don't know why else uncommit would fail.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2757093258

Reply via email to