Re: [DISCUSS] PIP-186: Introduce two phase deletion protocol based on system topic

Asaf Mesika Thu, 02 Feb 2023 06:33:33 -0800

>
> When pulsar wants to delete a ledger, ManagedLedger uses
> ledgerDeletionService to send a message, the message content contains the
> waiting delete ledger info. After sending success, delete the ledger id
> from the metadata store.
> The consumer receives the message, it will use PulsarClient to send a
> delete command to the corresponding broker, the broker receives delete
> command, and do the actual delete operation.
>


If you persisted the message successfully by the producer and the broker
was terminated before being able to delete the ledger from the metadata
service?
I recommend having the logic to delete the ledger be done in the message
consumer side:
- if the ledger exists in the MD store, delete it.
- send delete command to BK
Both as I understand are idempotent. If for some reason one action was done
but the other wasn't - ZK down, BK down, broker hosting the consumer
terminated - either the message will not be acknowledged, or you negatively
acknowledge.

It's the ledger source, (MANAGED_LEDGER,MANAGED_CURSOR,SCHEMA_STORAGE)
> When the broker wants to delete a ledger, we will check if the bookkeeper
> metadata matches or not. In the pulsar, it will mark the ledger source when
> creating a new ledger. See LedgerMetadataUtils.


General question: When a ledger is persisted to ZK, where is the ledger
metadata persisted in ZK (more specifically it's metadata, which includes
the component?
Is it also used when building out the key (path) in ZK?

It marks the ledger as a normal ledger or an offload ledger, broker need it
> to determine whether to delete bookkeeper data or offload data.
>
Isn't the type saved together with the ledger in ZK?

It's for the offloaded ledger, when we want to delete the offload ledger,
> we need offloadedContextUuid, here we can simplify it to offloadContextUuid.

Sounds much better. Maybe offloadedLedgerUUID? (why context?)

 Are you encoding all that  extra info besides the ledger ID and its source
to avoid reading it again from ZK when deleting it?

It's for extended.
>
Can't really understand from that short sentence what you mean. Can you
please elaborate?


> In https://github.com/apache/pulsar/issues/16569. The first step section
> and second step section process flow picture are detailed.
>
 I'm sorry but you didn't answer all the questions I wrote. I'll paste them
here:
Can you explain the starting point? How does deletion work in general?
> When? What happens? ... I understand there are time based triggers, and
> sometimes used based triggers. They are somehow marked in metadata.

If delete fails, that means the storage system occur some problems. I guess
> the storage system will recovery in 10 mins.
>
> In https://github.com/apache/pulsar/issues/16569, we define
> reconsumeLaterOfTopicTwoPhaseDeletionInSeconds in the ServiceConfiguration,
> it's configurable.
> private int reconsumeLaterOfTopicTwoPhaseDeletionInSeconds = 600;


We need some experienced people here to contribute their opinion. Default
10min might be too much. I recommend you ask Penghui.

>
> > You mentioned you are working with a client which has retries configured.
> > Retry is client side based, ack one message while producing another,
> > transaction free. Are you prepared to handle a case where you acked but
> > failed to produce the message, hence you lost it completely?
> >
> The pulsarClient only sends a new message that succeeds, then ack the
> origin message, so didn't care in this case.
>
Ok, then you will have concurrent consumption of a message which tries to
delete the ledger from ZK and then tries to delete it from BK? Isn't that a
concurrency issue?


> > > If we want to delete a topic, we should send the delete ledger msg to
> > > system topic and remove ledger id from metadata one by one, after all
> the
> > > ledger has been deleted, then delete the topic. If any ledger operate
> > > failed, we think this delete topic operation failed and return the left
> > > ledger id in the response.
> >
> > I couldn't understand. Can you please clarify this section. How exactly
> > topic deletion is modified to use the features described in this pip?
> >
> We need to ensure that all ledgers are deleted before the topic is
> deleted, otherwise, there will be orphan ledgers.
>
Your PIP is about introducing a workflow for deleting a ledger, right?
When you delete a topic you iterate its ledger list and delete each ledger.
Your PIP changes the way each ledger is deleted and makes it async. So I
guess what I want to understand is: What are the changes you are making to
topic deletion due to your PIP? You said "we need to make sure" - can you
please clarify how you will make sure?

> 10.
> > Backward compatibility - I would make sure to document in the docs
> exactly
> > what to do in case we need to rollback (cook book).
> Well.


You added

> If user upgrade and enable two phase deletion, the ledger deletion msg
> will store in system topic. If the user rolls back to the old version and
> the system topic msg hasn't consumed all, some ledger may not delete.


A cookbook is giving instructions like "Before downgrading, wait for
metrics xxx to be 0 which indicates the in-flight ledgers delete commands
have all been processed". Here you just say, some ledgers may note delete -
give them some action - what commands can they run to delete those ledgers
themselves? Help them be successful as you have all the
implementation knowledge - they have nothing.


> >
> > 11.
> > General comment - You're basically implementing a bespoke workflow using
> a
> > topic to save the state of where you are in the topic.
> > Is this the only place in Pulsar (delete ledger) that an action is
> composed
> > of several steps ?
> > If the answer is no to this, wouldn't it be better to have a small
> utility
> > which is in charge of moving through the workflow steps? It can even be a
> > simple state enum, where you move your state from a to b to c to d and it
> > is persisted.
> We need to persist in the middle steps, and we didn't want to operate the
> metadata store continually, so used pulsar to persist it.
>

I didn't ask whether we should persist the workflow state to Pulsar instead
of ZK.
Can you please re-read my question?


> > 12. Monitoring
> > Some actions here can take a long time. We're basically relying on logs
> to
> > monitor where we are in the flow?
> yes, we didn't trace the ledger deletion steps. we only use stats to
> record whether the delete operation succeeds or not.
>
That's not enough.
A user needs to be able to operate the cluster, so we need to give them
eyes into what's happening inside the system.
Please add metrics to help them figure that out:
* How many in-flight ledger deletion commands do we have?
* How many ZK deletions failed/succeeded?
* How many BK deletions failed/succeeded?

Thanks!

Asaf

On Mon, Jan 30, 2023 at 4:10 PM Yan Zhao <horizo...@apache.org> wrote:

> > Couples of notes:
> >
> > 1.
> >
> > > In the LedgerDeletionService  start, it will create  a producer to send
> > > pending delete ledger.
> > > When delete a ledger,  a PendingDeleteLedgerInfo msg with 1 min delay
> (the
> > > delay is for consumer side, if send it immediately, maybe the metadata
> > > din't change when consumer receive it). After the send operation
> succeeds,
> > >  then to operate metadata. If send msg failed, we think this deletion
> > > operation failed, and didn't operate metadata.
> >
> >
> > This section is completely unclear to me. Can you please provide step by
> > step exactly what will happen in the workflow?
> > Who does what and where (node)?
>
> In PulsarService, it defines ledgerDeletionService. The
> ledgerDeletionService will create a producer and a consumer, which looks
> like topicPoliciesService. The PulsarService passes it to ManagedLedger.
> When pulsar wants to delete a ledger, ManagedLedger uses
> ledgerDeletionService to send a message, the message content contains the
> waiting delete ledger info. After sending success, delete the ledger id
> from the metadata store.
> The consumer receives the message, it will use PulsarClient to send a
> delete command to the corresponding broker, the broker receives delete
> command, and do the actual delete operation.
>
> In https://github.com/apache/pulsar/issues/16569, these are some pictures
> for the workflow.
>
>
> >
> > 2.
> >
> > > /**
> > >      * The ledger component . managed-ledger, managed-cursor and
> > > schema-storage.
> > >      */
> > >     private LedgerComponent ledgerComponent;
> >
> >
> > Why is this needed?
> > What do you mean by a component of a ledger? Is the ledger divided into
> > components?
>
> It's the ledger source, (MANAGED_LEDGER,MANAGED_CURSOR,SCHEMA_STORAGE)
> When the broker wants to delete a ledger, we will check if the bookkeeper
> metadata matches or not. In the pulsar, it will mark the ledger source when
> creating a new ledger. See LedgerMetadataUtils.
>
> >
> > 3.
> >
> > >   /**
> > >      * The ledger type. ledger or offload-ledger.
> > >      */
> > >     private LedgerType ledgerType;
> >
> >
> > I don't understand why you need this type.
> It marks the ledger as a normal ledger or an offload ledger, broker need
> it to determine whether to delete bookkeeper data or offload data.
>
> >
> > 4.
> >
> > > private MLDataFormats.ManagedLedgerInfo.LedgerInfo context;
> >
> >
> > Context is a very generic word, but your type is so specific. Can you
> > please explain why you need this for?
> >
> > Are you sure you want to tie one data structure into the other - just
> > validating.
> It's for the offloaded ledger, when we want to delete the offload ledger,
> we need offloadedContextUuid, here we can simplify it to offloadContextUuid.
>
> >
> > 5.
> >
> > >   /**
> > >      * Extent properties.
> > >      */
> > >     private Map<String, String> properties = new HashMap<>();
> >
> >
> > Why is this needed?
> It's for extended.
>
> >
> >
> > 6.
> >
> > > When receiving a pending delete ledger msg, we will check if the topic
> > > still exists. If the topic exists, send a delete command
> > > (PendingDelteLedger) to the broker which owns the topic. In the
> broker, it
> > > will check if the ledger is still in the metadata store, if the ledger
> in
> > > the metadata store means the ledger is still in use, give up to delete
> this
> > > ledge
> >
> >
> > I don't understand this workflow. You say you check if it's in the
> metadata
> > store, and if it is , then it is used - what will make it unused?
> > Can you explain the starting point? How does deletion work in general?
> > When? What happens? ... I understand there are time based triggers, and
> > sometimes used based triggers. They are somehow marked in metadata.
> In https://github.com/apache/pulsar/issues/16569. The first step section
> and second step section process flow picture are detailed.
>
> >
> > 7.
> >
> > If we delete successfully, the consumer will ack this msg. If delete
> fails,
> > > reconsume this msg 10 min later.
> >
> >
> > Where did you define 10min?
> > Why 10 min?
> If delete fails, that means the storage system occur some problems. I
> guess the storage system will recovery in 10 mins.
>
> In https://github.com/apache/pulsar/issues/16569, we define
> reconsumeLaterOfTopicTwoPhaseDeletionInSeconds in the ServiceConfiguration,
> it's configurable.
> private int reconsumeLaterOfTopicTwoPhaseDeletionInSeconds = 600;
>
> >
> > You mentioned you are working with a client which has retries configured.
> > Retry is client side based, ack one message while producing another,
> > transaction free. Are you prepared to handle a case where you acked but
> > failed to produce the message, hence you lost it completely?
> >
> The pulsarClient only sends a new message that succeeds, then ack the
> origin message, so didn't care in this case.
>
> > 8.
> >
> > > If we want to delete a topic, we should send the delete ledger msg to
> > > system topic and remove ledger id from metadata one by one, after all
> the
> > > ledger has been deleted, then delete the topic. If any ledger operate
> > > failed, we think this delete topic operation failed and return the left
> > > ledger id in the response.
> >
> > I couldn't understand. Can you please clarify this section. How exactly
> > topic deletion is modified to use the features described in this pip?
> >
> We need to ensure that all ledgers are deleted before the topic is
> deleted, otherwise, there will be orphan ledgers.
>
> > 9.>
> > Regarding configuration. I suggest we prefix all config keys with the
> > feature name so we can easily separate them.
> That's fine.
>
> >
> > 10.
> > Backward compatibility - I would make sure to document in the docs
> exactly
> > what to do in case we need to rollback (cook book).
> Well.
>
> >
> > 11.
> > General comment - You're basically implementing a bespoke workflow using
> a
> > topic to save the state of where you are in the topic.
> > Is this the only place in Pulsar (delete ledger) that an action is
> composed
> > of several steps ?
> > If the answer is no to this, wouldn't it be better to have a small
> utility
> > which is in charge of moving through the workflow steps? It can even be a
> > simple state enum, where you move your state from a to b to c to d and it
> > is persisted.
> We need to persist in the middle steps, and we didn't want to operate the
> metadata store continually, so used pulsar to persist it.
>
>
> > 12. Monitoring
> > Some actions here can take a long time. We're basically relying on logs
> to
> > monitor where we are in the flow?
> yes, we didn't trace the ledger deletion steps. we only use stats to
> record whether the delete operation succeeds or not.
>

Re: [DISCUSS] PIP-186: Introduce two phase deletion protocol based on system topic

Reply via email to