In my understanding, the PIP is for some certain `extreme` conditions. Some ledgers failing to recover is an event with a very low probability, and it should be hard to reproduce(unless we delete some ledgers manually).
If we skip these failed-recover ledgers, message production should be able to proceed smoothly. But for message consumption, how can we deal with it? 1. Skip them: it will lead to data loss, even these ledgers just failed to recover temporarily. 2. Not skip them: Consumers may cann't receive messages from brokers, the consumption of messages cannot proceed normally, even these ledgers were deleted and cannot recover. So we must accurately determine whether these Ledgers are temporarily unable to recover or will never be able to recover. Maybe we need to persist the failed-recover number of times of the ledger into MetadataStore, if the ledger recovers successfully, set it to 0, else, +1. And introduce a new configuration such as `ledgerFailedToRecoverThreashold`, if the ledger continues to fail-recover, and the number of times is greater than `ledgerFailedToRecoverThreashold` , delete the ledger from MetadataStore. Thanks PengHui Li <peng...@apache.org> 于2023年12月20日周三 16:32写道: > Hi Rajan, > > I tried to test the case that you provided in the proposal. > > - Produce messages to a topic > - Unload the topic 5 times to ensure we have some ledgers in the topic > - Delete one ledger by using the bookkeeper shell > - Unload the topic again > - Start to produce messages again, it works > - Start a consumer to consume messages from the earliest position, it get > stuck on the deleted ledger > > I don't see the topic load issues. The topic loading works fine, and the > producer works fine. > But the proposal said it would resolve the topic load issue, can you > reproduce the topic load issue? > > Regards, > Penghui > > > > On Wed, Dec 20, 2023 at 3:28 AM Rajan Dhabalia <rdhaba...@apache.org> > wrote: > > > Hi, > > > > We have an issue to fail loading topics in unrecoverable situation and > > impacting topic availability:: > > https://github.com/apache/pulsar/issues/21751 > > This PIP addresses the issue and allows brokers to handle such situations > > and maintain the topic availability: > > > > PIP: https://github.com/apache/pulsar/pull/21752 > > > > Thanks, > > Rajan > > >