On Wed, 27 Mar 2024 17:24:34 GMT, Volker Simonis <simo...@openjdk.org> wrote:
> Diagnostic command for zeroing unused parts of the heap > > I propose to add a new diagnostic command `System.zero_unused_memory` which > zeros out all unused parts of the heap. The name of the command is > intentionally GC/heap agnostic because in the future it might be extended to > also zero unused parts of the Metaspace and/or CodeCache. > > Currently `System.zero_unused_memory` triggers a full GC and afterwards zeros > unused parts of the heap. Zeroing can help snapshotting technologies like > [CRIU][1] or [Firecracker][2] to shrink the snapshot size of VMs/containers > with running JVM processes because pages which only contain zero bytes can be > easily removed from the image by making the image *sparse* (e.g. with > [`fallocate -p`][3]). > > Notice that uncommitting unused heap parts in the JVM doesn't help in the > context of virtualization (e.g. KVM/Firecracker) because from the host > perspective they are still dirty and can't be easily removed from the > snapshot image because they usually contain some non-zero data. More details > can be found in my FOSDEM talk ["Zeroing and the semantic gap between host > and guest"][4]. > > Furthermore, removing pages which only contain zero bytes (i.e. "empty > pages") from a snapshot image not only decreases the image size but also > speeds up the restore process because empty pages don't have to be read from > the image file but will be populated by the kernel zero page first until they > are used for the first time. This also decreases the initial memory footprint > of a restored process. > > An additional argument for memory zeroing is security. By zeroing unused heap > parts, we can make sure that secrets contained in unreferenced Java objects > are deleted. Something that's currently impossibly to achieve from Java > because even if a Java program zeroes out arrays with sensitive data after > usage, it can never guarantee that the corresponding object hasn't already > been moved by the GC and an old, unreferenced copy of that data still exists > somewhere in the heap. > > A prototype implementation for this proposal for Serial, Parallel, G1 and > Shenandoah GC is available in the linked pull request. > > [1]: https://criu.org > [2]: > https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md > [3]: https://man7.org/linux/man-pages/man1/fallocate.1.html > [4]: > https://fosdem.org/2024/schedule/event/fosdem-2024-3454-zeroing-and-the-semantic-gap-between-host-and-guest/ This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/18521