+1 (binding) On Fri, Jan 14, 2022 at 4:32 PM Aloys Zhang <aloyszh...@apache.org> wrote:
> +1 (non-binding) > > Haiting Jiang <jianghait...@apache.org> 于2022年1月14日周五 16:12写道: > > > +1 (non) > > > > On 2022/01/14 03:23:37 mattison chao wrote: > > > +1 (non-binding) > > > > > > Best, > > > Mattison > > > > > > On Fri, 14 Jan 2022 at 11:19, Hang Chen <chenh...@apache.org> wrote: > > > > > > > +1 (binding) > > > > > > > > Best, > > > > Hang > > > > > > > > Zhanpeng Wu <wuzhanpeng.w...@gmail.com> 于2022年1月14日周五 10:37写道: > > > > > > > > > > This is the voting thread for PIP-129. It will stay open for at > > least 48 > > > > > hours. Pasted below for quoting convenience. > > > > > > > > > > ---- > > > > > > > > > > https://github.com/apache/pulsar/issues/13526 > > > > > > > > > > ---- > > > > > > > > > > ## Motivation > > > > > > > > > > Under the current ledger-trimming design in > > > > > > > > > > > > `org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#internalTrimLedgers`, > > > > > we need to collect those ledgers that need to be deleted first, and > > then > > > > > perform the asynchronous deletion of the ledger concurrently, but > we > > do > > > > not > > > > > continue to pay attention to whether the deletion operation is > > completed. > > > > > If the meta-information update has been successfully completed but > an > > > > error > > > > > occurs during the asynchronous deletion, the ledger may not be > > deleted, > > > > but > > > > > at the logical level we think that the deletion has been completed, > > which > > > > > will make this part of the data remain in the storage layer forever > > (such > > > > > as bk). As the usage time of the cluster becomes longer, the > residual > > > > data > > > > > that cannot be deleted will gradually increase. > > > > > > > > > > In order to achieve this goal, we can separate the logic of > > > > > meta-information update and ledger deletion. In the trimming > > process, we > > > > > can first mark which ledgers are deletable, and update the results > > to the > > > > > metadatastore. We can perform the deletion of marked ledgers > > > > asynchronously > > > > > in the callback of updating the meta information, so that the > > original > > > > > logic can be retained seamlessly. Therefore, when we are rolling > > upgrade > > > > or > > > > > rollback, the only difference is whether the deleted ledger is > > marked for > > > > > deletion. > > > > > > > > > > To be more specific: > > > > > 1. for upgrade, only the marker information of ledger has been > > added, and > > > > > the logical sequence of deletion has not changed. > > > > > 2. for rollback, some ledgers that have been marked for deletion > may > > not > > > > be > > > > > deleted due to the restart of the broker. This behavior is > consistent > > > > with > > > > > the original version. > > > > > > > > > > In addition, if the ledger that has been marked is not deleted > > > > > successfully, the marker will not be removed. So for this part of > > > > ledgers, > > > > > every time trimming is triggered, it will be deleted again, which > is > > > > > equivalent to a check and retry mechanism. > > > > > > > > > > ## Goal > > > > > > > > > > We need to modify some logic in > > > > > > > > > > > > `org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#internalTrimLedgers` > > > > > so that the ledger deletion logic in ledger-trimming is split into > > two > > > > > stages, marking and deleting. Once the marker information is > updated > > to > > > > the > > > > > metadatastore, every trimming will try to trigger the ledger > deletion > > > > until > > > > > all the deleteable ledgers are successfully deleted. > > > > > > > > > > ## Implementation > > > > > > > > > > This proposal aims to separate the deletion logic in > > ledger-trimming, so > > > > > that `ManagedLedgerImpl#internalTrimLedgers` is responsible for > > marking > > > > the > > > > > deletable ledgers and then perform actual ledger deletion according > > to > > > > the > > > > > metadatastore. > > > > > > > > > > Therefore, the entire trimming process is broken down into the > > following > > > > > steps: > > > > > > > > > > 1. mark deletable ledgers and update ledger metadata. > > > > > 2. do acutual ledger deletion after metadata is updated. > > > > > > > > > > For step 1, we can store the marker of deletable information in > > > > > > `org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#propertiesMap`. > > > > When > > > > > retrieving the deleted ledger information, we can directly query by > > > > > iterating `propertiesMap`. If this solution is not accepted, maybe > > we can > > > > > create a new znode to store these information, but this approach > > will not > > > > > be able to reuse the current design. > > > > > > > > > > For step 2, we can perform the deletion of marked ledgers > > asynchronously > > > > in > > > > > the callback of updating the meta information. And every trimming > > will > > > > > trigger the check and delete for those deleteable ledgers. > > > > > > > > > > Related PR: https://github.com/apache/pulsar/pull/13575 > > > > > > > > > >