Hi Andrey, Yes, this BP aims to address the issue of rescheduling the three audit tasks: AuditorCheckAllLedgersTask, AuditorPlacementPolicyCheckTask, and AuditorReplicasCheckTask (excluding AuditorBookieTask).
Before this BP, there was no REST API or any command-line tool available to initiate the rescheduling of these three tasks. Even if you updated the last run time of these tasks in ZK, it would not take effect in the Auditor. The only way to do so was by restarting the Auditor process. Thanks! wenbingshen Andrey Yegorov <ayego...@apache.org> 于2024年5月31日周五 03:20写道: > Does this new functionality do anything that is not covered by REST API? > > https://bookkeeper.apache.org/docs/4.10.0/admin/http/#endpoint-apiv1autorecoverytrigger_audit > > On 2023/07/10 06:34:35 Wenbing Shen wrote: > > Hi everyone, > > > > I would like to initiate a discussion regarding the current bookie force > > reschedule auditor tasks. Below is the detailed BP content. If you have > any > > questions or ideas, please feel free to reply to this email for further > > discussion. Thank you! > > > > This is the master ticket for tracking BP-63 : > > Proposal PR - #3964 <https://github.com/apache/bookkeeper/pull/3964> > > Motivation > > > > Currently, the Bookie can reschedule Auditor check tasks in several ways, > > excluding the auditorBookieTask as it provides a separate mechanism to > > trigger task reexecution. This BP specifically discusses > > > AuditorCheckAllLedgersTask/AuditorPlacementPolicyCheckTask/AuditorReplicasCheckTask: > > > > 1: The Bookie provides three execution times based on ZooKeeper, > > checkallledgersctime/placementpolicycheckctime/replicascheckctime. By > > updating these execution times, we can dynamically adjust the execution > > frequency of auditor tasks, but it requires restarting the Auditor > process > > or reopening the Auditor election to trigger task execution. > > > > 2: By using the ForceAuditorChecksCmd tool, which is still based on the > > underlying logic of the first point, restarting the Auditor or performing > > an election is also necessary to trigger task execution. > > > > 3: The Decommission and RecoveryBookie tools tend to focus on executing > > recovery logic and only check and recover a specific subset of Bookie > > services. > > > > The above methods are complex and have poor stability when rescheduling > the > > Auditor check tasks in a cluster. > > Proposal > > > > Therefore, I propose further optimizing the rescheduling of Auditor > tasks. > > > > 1: The Auditor monitors the persistent znode path > > /ZK_LEDGERS_ROOT_PATH/underreplication/scheduleAuditor. > > 2: Users modify the task ctime using the ForceAuditorChecksCmd tool and > > forcefully create the above znode path using the force parameter. > > 3: The Auditor creates callbacks through scheduleAuditor to reschedule > the > > aforementioned three tasks. > > 4: After the Auditor completes rescheduling the tasks, the > scheduleAuditor > > node is deleted. > > 5: When the Auditor starts, it deletes the old scheduleAuditor node to > > avoid logical confusion. > > > > This way, we can trigger the scheduling and execution of Auditor tasks > > through an online interface without relying on service restart or > > re-election. > > Compatibility, Deprecation, and Migration Plan > > > > There are no compatibility issues. This BP introduces a new trigger flag > > that does not affect the original logic and does not involve any changes > to > > other existing public APIs. There is no deprecation or migration plan. > > > > > > Best regards, > > > > Wenbing Shen > > >