Hi BookKeeper Community,

I’d like to propose a modification to how garbage collection (GC)
handles disk-full scenarios.
Currently, when any ledger disk reaches full capacity,
suspendMajorGC()/suspendMinorGC() pauses GC for all disks.
This behavior can unnecessarily impact healthy disks, especially in
cases of uneven disk utilization.

Consider two scenarios:
1. Even Data Distribution:
   All disks are nearly full, and one fills up first. Temporarily
disabling GC only on the full disk (before propagating suspension to
others) is safe.
2. Uneven Data Distribution:
   Due to write skew or cleanup inconsistencies, a single disk may
fill up while others still have free space. Halting GC globally
penalizes operational disks.

To address this, I propose three solutions:

Option 1: Reuse isReadOnlyModeOnAnyDiskFullEnabled. When
isReadOnlyModeOnAnyDiskFullEnabled == true, stop GC on all disks;
otherwise, other disks should continue normal operations without GC
suspension.
Reason: isReadOnlyModeOnAnyDiskFullEnabled reflects the user’s intent
about whether to stop all bookie writes when any single disk is full,
but GC might need to create new files for writing data ahead of
cleanup.

Option 2: When a single disk becomes full, only stop GC for that
specific disk. Other disks should continue their GC processes
uninterrupted.
Reason: This issue should be treated as a bug fix rather than a
breaking change. No configuration is needed; simply fix the current
behavior.

Option 3: Add a new configuration to control whether to stop GC on
other disks when any single disk becomes full.
Reason: This does not change the existing behavior but allows users to
configure it according to their needs.

I think Option 2 is the most appropriate, as it directly addresses the
problem without introducing additional configuration complexity.

Looking forward to your feedback.

BR,
Xiangying

Reply via email to