Re: [DISCUSS] PIP-186: Introduce two phase deletion protocol based on system topic

Asaf Mesika Tue, 07 Mar 2023 04:51:53 -0800

Thanks for the modifications. Looking much better!

In the LedgerDeletionService start, it will create a producer for sending
> pending delete ledger. When deleting a ledger, the producer sends
> PendingDeleteLedgerInfo to the system-topic. If the sent succeeds, delete
> the ledger id from the whole ledgers and update the ledgers into the meta
> store. If the sent fails, do nothing.

You don't delete the ledger ID from the whole ledgers, then update the
ledgers in the metadata store. We described it above already; we remove the
pending delete ledger ID from the list of ledger IDs, publish the message,
and only at the end update the metadata store.

> In the LedgerDeletionService start, it will create a producer for sending
> pending delete ledger. When deleting a ledger, the producer sends
> PendingDeleteLedgerInfo to the system-topic. If the sent succeeds, delete
> the ledger id from the whole ledgers and update the ledgers into the meta
> store. If the sent fails, do nothing.
>
> After that section, you write the methods of DeletionService and the
pseudo-code, but it's hard to understand what goes where: What exactly
deletion service does, and where the pseudo code will actually go into.

Also, let's concentrate on the implementation details in the Implementation
section High-Level Design. Let the readers understand the concept end to
end in the High-Level Design, and then in the Implementation, start diving
into the nitty gritty details.

The consumer uses pulsar admin to send PendingDeleteLedgerInfo by the
> broker restful API.

Please explain here that you will introduce a new Admin API for that. Also,
explain that you will issue that API against the broker who owns the topic
(The schema registry has a ledger per topic?)

I still don't understand which methods or behavior you're changing exactly
for the managed ledger, cursor, and schema.
Why am I asking?
- Ledger deletion can happen because:
* Background truncation since ledgers are allowed to be deleted based on
expiration policy.
* Truncate request by an admin API.
* More?

If you find the ledger ID is still in the meta-store, you return a failure,
and the consumer acks the message, right?
If it's background deletion, all good since it will kick in again in a few
seconds and retry.
But what happens if a user runs a REST admin API command to truncate the
topic?

check if the ledger is still in use,

You mean, check if the ledger ID still exists in the ledger ID list of the
topic in the metadata store, right?

 then check the PendingDeleteLedgerInfo ledgerType.

Check for what?

2.2 If the topic not exists, the consumer checks the
> PendingDeleteLedgerInfo ledgerType.

3.1 If the ledgerType is Ledger, delete the data at the bookkeeper side.

3.2 If the ledgerType is Offload-Ledger, delete the data at the
> tiered-storage side.

Something doesn't look good in the numbering and how you structure it. 3.1
is probably an indent inside 2.2, no?

---
1. What happens if the topic doesn't exist when you are in the broker in
the delete pending ledger API implementation?
2. Say the ledger ID is still in the metadata store. If the background
truncation of the managed ledge was the trigger, there is no real point in
retrying, right? Once you start over, another message will be sent to the
system topic.
3. Can you explain exactly the cases in which it is actually worth doing
the retry?
I mean the obvious: you failed to delete it from BK due to a transient
error. But the previous one of checking the list? worth the retry?

then respond to the error msg "ledger still in use" to the consumer.

You need to have those error codes in your pseudo code.
and your consumer pseudo code should check and act accordingly

For security, the hacker may fake a PendingDeleteLedgerInfo and send it to
> the system-topic. So we need to do some checks works before deleting the
> data.
>
> I think we’re over-engineering this or overprotecting.
Today, anybody with access to Pulsar Admin (who has a token with a role)
can do a lot of harm - delete an entire topic, and truncate the topic. Why
do you want to protect yourself from something like that? It’s too much,
IMO.

Build an offloader according to the PendingDeleteLedgerInfo.offloadContext
>
>  Isn’t that expensive?

> There are some errors but we think it was successful, and avoid retrying
> it again.
>
>    - The Ledger is Still in use
>    - The PendingDeleteLedgerInfo is not matched with the bookkeeper
>    ledger metadata
>    - The data is not exist in the bookkeeper or tiered-storage
>    - The PendingDeleteLedgerInfo param is not incorrect, miss some
>    essential property
>
>
> Maybe specify for each your resolution:
* For ledger ID that still exists in the metadata store - don't retry, ack
the message because ...
* ...

Add new description API in pulsarAdmin.BrokerStats for getting it.

What?

---

So LedgerDeletionService.asyncDelete*() calls
ManagedLedger.asyncDeleteLedger()?

----

> Consumer use pulsar.admin().topics().asyncDeleteLedger() to send request
> to the broker

 So, I presume we don't really want users to have that ability, right?
It should be "private" for the users.

But once we expose it via AdminAPI, it's out there, ready to be used by
anyone.

1. Perhaps we should introduce a new concept of private AdminAPI which can
only be consumed internally by other brokers.
2. What's the needed authorization to make such a delete command of a
ledger?

Thanks,

Asaf

On Sat, Mar 4, 2023 at 6:42 PM Yan Zhao <horizo...@apache.org> wrote:

> Hi, Asaf, Tune the pip https://github.com/apache/pulsar/issues/16569,
> please help to review it again, thanks!
>

Re: [DISCUSS] PIP-186: Introduce two phase deletion protocol based on system topic

Reply via email to