Hi Enrico, Thank you for your feedback. Regarding the issue you mentioned, I have two thoughts:
First: As I mentioned in Proposal 1, whether to stop writing to all disks when one disk is full is controlled by isReadOnlyModeOnAnyDiskFullEnabled. When the user sets isReadOnlyModeOnAnyDiskFullEnabled = false, it indicates that they want the system to continue accepting write requests even after one disk is full. In this case, writing to a ledger disk without available space will definitely fail, so disabling GC for that disk should not be a problem. Meanwhile, other normally functioning ledger disks should continue to work as expected by the user, including running GC as usual. Stopping GC on all disks in this scenario might not align with the user's intention. Second: When one ledger disk is full while other disks still have space, new data should not continue to be written to the full ledger disk. This is a point that needs optimization, rather than an already defined feature. BR, Xiangying On Wed, Aug 20, 2025 at 8:48 PM Enrico Olivelli <eolive...@gmail.com> wrote: > > Hello Xiangying, > thanks for sharing your problem and your proposals. > > One issue I can see is that there is no way for the Bookie to go in > "partial readonly mode". > If you stop GC only on one disk and the Bookie accepts writes for that disk > the problem is going to be worse and worse > > > Best > Enrico > > > Il giorno mer 20 ago 2025 alle ore 14:21 xiangying meng < > xiangy...@apache.org> ha scritto: > > > Hi BookKeeper Community, > > > > I’d like to propose a modification to how garbage collection (GC) > > handles disk-full scenarios. > > Currently, when any ledger disk reaches full capacity, > > suspendMajorGC()/suspendMinorGC() pauses GC for all disks. > > This behavior can unnecessarily impact healthy disks, especially in > > cases of uneven disk utilization. > > > > Consider two scenarios: > > 1. Even Data Distribution: > > All disks are nearly full, and one fills up first. Temporarily > > disabling GC only on the full disk (before propagating suspension to > > others) is safe. > > 2. Uneven Data Distribution: > > Due to write skew or cleanup inconsistencies, a single disk may > > fill up while others still have free space. Halting GC globally > > penalizes operational disks. > > > > To address this, I propose three solutions: > > > > Option 1: Reuse isReadOnlyModeOnAnyDiskFullEnabled. When > > isReadOnlyModeOnAnyDiskFullEnabled == true, stop GC on all disks; > > otherwise, other disks should continue normal operations without GC > > suspension. > > Reason: isReadOnlyModeOnAnyDiskFullEnabled reflects the user’s intent > > about whether to stop all bookie writes when any single disk is full, > > but GC might need to create new files for writing data ahead of > > cleanup. > > > > Option 2: When a single disk becomes full, only stop GC for that > > specific disk. Other disks should continue their GC processes > > uninterrupted. > > Reason: This issue should be treated as a bug fix rather than a > > breaking change. No configuration is needed; simply fix the current > > behavior. > > > > Option 3: Add a new configuration to control whether to stop GC on > > other disks when any single disk becomes full. > > Reason: This does not change the existing behavior but allows users to > > configure it according to their needs. > > > > I think Option 2 is the most appropriate, as it directly addresses the > > problem without introducing additional configuration complexity. > > > > Looking forward to your feedback. > > > > BR, > > Xiangying > >