Re: [DISCUSS] PIP-327 Support force topic loading for unrecoverable errors

Rajan Dhabalia Wed, 20 Dec 2023 09:19:46 -0800

>>>>>  I don't see the topic load issues. The topic loading works fine, and
the producer works fine. But the proposal said it would resolve the topic
load issue, can you reproduce the topic load issue?


Yes, you tried different usecase and not the one which is mentioned in the
PIP. Deleted ledgers give specific error code that is handled by broker and
broker skips such non-recoverable ledgers. However, you can reproduce issue
#21751 when bookies are removed from the clusters without graceful
recovery, in that case brokers can not conclude such non-recoverable errors
which could have impacted multiple ledgers and topics, and it makes those
topics unavailable until there will be a manual cleanup of managed-ledger
metadata for each topic.

>>>>>> And introduce a new configuration such as
`ledgerFailedToRecoverThreashold`, if the ledger continues to fail-recover,.

No, let's not introduce such unnecessary complication as we already have
autoSkipNonRecoverableData flag to handle non-recoverable errors and one
doesn't want to take a bet on number of retries to skip non-recoverable
data but one needs a control when one is sure about actual data loss or
bookies are removed from the clusters and one really requires force
skipping such non-recoverable data using a flag that helps to forcefully
skip them.

Thanks,
Rajan

On Wed, Dec 20, 2023 at 1:17 AM 太上玄元道君 <[email protected]> wrote:

> In my understanding, the PIP is for some certain `extreme` conditions. Some
> ledgers failing to recover is an event with a very low probability, and it
> should be hard to reproduce(unless we delete some ledgers manually).
>
> If we skip these failed-recover ledgers, message production should be able
> to proceed smoothly.
>
> But for message consumption, how can we deal with it?
> 1. Skip them: it will lead to data loss, even these ledgers just failed to
> recover temporarily.
> 2. Not skip them: Consumers may cann't receive messages from brokers, the
> consumption of messages cannot proceed normally, even these ledgers were
> deleted and cannot recover.
>
> So we must accurately determine whether these Ledgers are temporarily
> unable to recover or will never be able to recover.
> Maybe we need to persist the failed-recover number of times of the ledger
> into MetadataStore, if the ledger recovers successfully, set it to 0, else,
> +1.
> And introduce a new configuration such as
> `ledgerFailedToRecoverThreashold`,
> if the ledger continues to fail-recover, and the number of times is
> greater than `ledgerFailedToRecoverThreashold` , delete the ledger from
> MetadataStore.
>
> Thanks
>
> PengHui Li <[email protected]> 于2023年12月20日周三 16:32写道：
>
> > Hi Rajan,
> >
> > I tried to test the case that you provided in the proposal.
> >
> > - Produce messages to a topic
> > - Unload the topic 5 times to ensure we have some ledgers in the topic
> > - Delete one ledger by using the bookkeeper shell
> > - Unload the topic again
> > - Start to produce messages again, it works
> > - Start a consumer to consume messages from the earliest position, it get
> > stuck on the deleted ledger
> >
> > I don't see the topic load issues. The topic loading works fine, and the
> > producer works fine.
> > But the proposal said it would resolve the topic load issue, can you
> > reproduce the topic load issue?
> >
> > Regards,
> > Penghui
> >
> >
> >
> > On Wed, Dec 20, 2023 at 3:28 AM Rajan Dhabalia <[email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > > We have an issue to fail loading topics in unrecoverable situation and
> > > impacting topic availability::
> > > https://github.com/apache/pulsar/issues/21751
> > > This PIP addresses the issue and allows brokers to handle such
> situations
> > > and maintain the topic availability:
> > >
> > > PIP: https://github.com/apache/pulsar/pull/21752
> > >
> > > Thanks,
> > > Rajan
> > >
> >
>

Re: [DISCUSS] PIP-327 Support force topic loading for unrecoverable errors

Reply via email to