>>>>> I don't see the topic load issues. The topic loading works fine, and the producer works fine. But the proposal said it would resolve the topic load issue, can you reproduce the topic load issue?
Yes, you tried different usecase and not the one which is mentioned in the PIP. Deleted ledgers give specific error code that is handled by broker and broker skips such non-recoverable ledgers. However, you can reproduce issue #21751 when bookies are removed from the clusters without graceful recovery, in that case brokers can not conclude such non-recoverable errors which could have impacted multiple ledgers and topics, and it makes those topics unavailable until there will be a manual cleanup of managed-ledger metadata for each topic. >>>>>> And introduce a new configuration such as `ledgerFailedToRecoverThreashold`, if the ledger continues to fail-recover,. No, let's not introduce such unnecessary complication as we already have autoSkipNonRecoverableData flag to handle non-recoverable errors and one doesn't want to take a bet on number of retries to skip non-recoverable data but one needs a control when one is sure about actual data loss or bookies are removed from the clusters and one really requires force skipping such non-recoverable data using a flag that helps to forcefully skip them. Thanks, Rajan On Wed, Dec 20, 2023 at 1:17 AM 太上玄元道君 <dao...@apache.org> wrote: > In my understanding, the PIP is for some certain `extreme` conditions. Some > ledgers failing to recover is an event with a very low probability, and it > should be hard to reproduce(unless we delete some ledgers manually). > > If we skip these failed-recover ledgers, message production should be able > to proceed smoothly. > > But for message consumption, how can we deal with it? > 1. Skip them: it will lead to data loss, even these ledgers just failed to > recover temporarily. > 2. Not skip them: Consumers may cann't receive messages from brokers, the > consumption of messages cannot proceed normally, even these ledgers were > deleted and cannot recover. > > So we must accurately determine whether these Ledgers are temporarily > unable to recover or will never be able to recover. > Maybe we need to persist the failed-recover number of times of the ledger > into MetadataStore, if the ledger recovers successfully, set it to 0, else, > +1. > And introduce a new configuration such as > `ledgerFailedToRecoverThreashold`, > if the ledger continues to fail-recover, and the number of times is > greater than `ledgerFailedToRecoverThreashold` , delete the ledger from > MetadataStore. > > Thanks > > PengHui Li <peng...@apache.org> 于2023年12月20日周三 16:32写道: > > > Hi Rajan, > > > > I tried to test the case that you provided in the proposal. > > > > - Produce messages to a topic > > - Unload the topic 5 times to ensure we have some ledgers in the topic > > - Delete one ledger by using the bookkeeper shell > > - Unload the topic again > > - Start to produce messages again, it works > > - Start a consumer to consume messages from the earliest position, it get > > stuck on the deleted ledger > > > > I don't see the topic load issues. The topic loading works fine, and the > > producer works fine. > > But the proposal said it would resolve the topic load issue, can you > > reproduce the topic load issue? > > > > Regards, > > Penghui > > > > > > > > On Wed, Dec 20, 2023 at 3:28 AM Rajan Dhabalia <rdhaba...@apache.org> > > wrote: > > > > > Hi, > > > > > > We have an issue to fail loading topics in unrecoverable situation and > > > impacting topic availability:: > > > https://github.com/apache/pulsar/issues/21751 > > > This PIP addresses the issue and allows brokers to handle such > situations > > > and maintain the topic availability: > > > > > > PIP: https://github.com/apache/pulsar/pull/21752 > > > > > > Thanks, > > > Rajan > > > > > >