On Wed, 26 Mar 2025 16:05:00 GMT, Stefan Karlsson <stef...@openjdk.org> wrote:

>>> What state is the memory in when such a failure happens? Do we even know if 
>>> the memory is still committed if an uncommit fails?
> >
>> If release/uncommit fails, then it would be hard to know what state the 
>> target memory is in. If the arguments are invalid (bad base address), the 
>> target region may not even be allocated. Or, in the case of uncommit, if the 
>> base address is not aligned, maybe the target committed region does indeed 
>> exist but the uncommit still fails. So it would be hard to determine how to 
>> readjust the NMT accounting afterward.
> 
> Agreed. And this would be a pre-existing problem already. If a 
> release/uncommit fails, then we have the similar issues for that as well.

Hi @stefank, Are you referring to the difficulty in determining the original 
allocation as being the pre-existing problem? I think that only becomes an 
issue if we decide to swap the order of NMT booking and the memory 
release/uncommit (assuming we don't just fail fatally). Since we don't need to 
readjust currently, if there's a failure we can just leave everything as it is.

>>> I don't understand why we don't treat that as a fatal error OR make sure 
>>> that all call-sites handles that error, which they don't do today.
> >
>> I think release/uncommit failures should be handled by the callers. 
>> Currently, uncommit failure is handled in most places by the caller, release 
>> failure seems mostly not. Since, at least for uncommit, we could sometimes 
>> fail for valid reasons, I think we shouldn't fail fatally in the os:: 
>> functions.
> 
> I would like to drill a bit deeper into this. Do you have any concrete 
> examples of an uncommit failure that should not be handled as a fatal error?

[`VirtualSpace::shrink_by`](https://github.com/openjdk/jdk/blob/jdk-25%2B15/src/hotspot/share/memory/virtualspace.cpp#L373)
 allows uncommit to fail without crashing. I'm not certain of the intention 
behind that. But it seems like it's because shrinking is an optimization and 
not always critical  that it be done immediately. 
[[1](https://github.com/openjdk/jdk/blob/jdk-25%2B15/src/hotspot/share/gc/serial/tenuredGeneration.cpp#L258)]

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2755468073

Reply via email to