On Wed, 26 Mar 2025 16:05:00 GMT, Stefan Karlsson <stef...@openjdk.org> wrote:
>>> What state is the memory in when such a failure happens? Do we even know if >>> the memory is still committed if an uncommit fails? > > >> If release/uncommit fails, then it would be hard to know what state the >> target memory is in. If the arguments are invalid (bad base address), the >> target region may not even be allocated. Or, in the case of uncommit, if the >> base address is not aligned, maybe the target committed region does indeed >> exist but the uncommit still fails. So it would be hard to determine how to >> readjust the NMT accounting afterward. > > Agreed. And this would be a pre-existing problem already. If a > release/uncommit fails, then we have the similar issues for that as well. Hi @stefank, Are you referring to the difficulty in determining the original allocation as being the pre-existing problem? I think that only becomes an issue if we decide to swap the order of NMT booking and the memory release/uncommit (assuming we don't just fail fatally). Since we don't need to readjust currently, if there's a failure we can just leave everything as it is. >>> I don't understand why we don't treat that as a fatal error OR make sure >>> that all call-sites handles that error, which they don't do today. > > >> I think release/uncommit failures should be handled by the callers. >> Currently, uncommit failure is handled in most places by the caller, release >> failure seems mostly not. Since, at least for uncommit, we could sometimes >> fail for valid reasons, I think we shouldn't fail fatally in the os:: >> functions. > > I would like to drill a bit deeper into this. Do you have any concrete > examples of an uncommit failure that should not be handled as a fatal error? [`VirtualSpace::shrink_by`](https://github.com/openjdk/jdk/blob/jdk-25%2B15/src/hotspot/share/memory/virtualspace.cpp#L373) allows uncommit to fail without crashing. I'm not certain of the intention behind that. But it seems like it's because shrinking is an optimization and not always critical that it be done immediately. [[1](https://github.com/openjdk/jdk/blob/jdk-25%2B15/src/hotspot/share/gc/serial/tenuredGeneration.cpp#L258)] ------------- PR Comment: https://git.openjdk.org/jdk/pull/24084#issuecomment-2755468073