Diagnostic command for zeroing unused parts of the heap

I propose to add a new diagnostic command `System.zero_unused_memory` which 
zeros out all unused parts of the heap. The name of the command is 
intentionally GC/heap agnostic because in the future it might be extended to 
also zero unused parts of the Metaspace and/or CodeCache.

Currently `System.zero_unused_memory` triggers a full GC and afterwards zeros 
unused parts of the heap. Zeroing can help snapshotting technologies like 
[CRIU][1] or [Firecracker][2] to shrink the snapshot size of VMs/containers 
with running JVM processes because pages which only contain zero bytes can be 
easily removed from the image by making the image *sparse* (e.g. with 
[`fallocate -p`][3]).

Notice that uncommitting unused heap parts in the JVM doesn't help in the 
context of virtualization (e.g. KVM/Firecracker) because from the host 
perspective they are still dirty and can't be easily removed from the snapshot 
image because they usually contain some non-zero data. More details can be 
found in my FOSDEM talk ["Zeroing and the semantic gap between host and 
guest"][4].

Furthermore, removing pages which only contain zero bytes (i.e. "empty pages") 
from a snapshot image not only decreases the image size but also speeds up the 
restore process because empty pages don't have to be read from the image file 
but will be populated by the kernel zero page first until they are used for the 
first time. This also decreases the initial memory footprint of a restored 
process. 

An additional argument for memory zeroing is security. By zeroing unused heap 
parts, we can make sure that secrets contained in unreferenced Java objects are 
deleted. Something that's currently impossibly to achieve from Java because 
even if a Java program zeroes out arrays with sensitive data after usage, it 
can never guarantee that the corresponding object hasn't already been moved by 
the GC and an old, unreferenced copy of that data still exists somewhere in the 
heap.

A prototype implementation for this proposal for Serial, Parallel, G1 and 
Shenandoah GC is available in the linked pull request.

[1]: https://criu.org
[2]: 
https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md
[3]: https://man7.org/linux/man-pages/man1/fallocate.1.html
[4]: 
https://fosdem.org/2024/schedule/event/fosdem-2024-3454-zeroing-and-the-semantic-gap-between-host-and-guest/

-------------

Commit messages:
 - Make VM_ZeroUnusedMemory a VM_GC_Sync_Operation
 - Implement unused memory zeroing for ShenadoahGC and move the zeroing part 
into a VM operation
 - Implement unused memory zeroing for G1GC
 - Implement unused memory zeroing for ParallelGC
 - 8329204: Diagnostic command for zeroing unused parts of the heap

Changes: https://git.openjdk.org/jdk/pull/18521/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=18521&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8329204
  Stats: 187 lines in 29 files changed: 187 ins; 0 del; 0 mod
  Patch: https://git.openjdk.org/jdk/pull/18521.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18521/head:pull/18521

PR: https://git.openjdk.org/jdk/pull/18521

Reply via email to