Re: [VOTE] PIP-129: Introduce intermediate state for ledger deletion
+1 (non) On 2022/01/14 03:23:37 mattison chao wrote: > +1 (non-binding) > > Best, > Mattison > > On Fri, 14 Jan 2022 at 11:19, Hang Chen wrote: > > > +1 (binding) > > > > Best, > > Hang > > > > Zhanpeng Wu 于2022年1月14日周五 10:37写道: > > > > > > This is the voting thread for PIP-129. It will stay open for at least 48 > > > hours. Pasted below for quoting convenience. > > > > > > > > > > > > https://github.com/apache/pulsar/issues/13526 > > > > > > > > > > > > ## Motivation > > > > > > Under the current ledger-trimming design in > > > > > `org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#internalTrimLedgers`, > > > we need to collect those ledgers that need to be deleted first, and then > > > perform the asynchronous deletion of the ledger concurrently, but we do > > not > > > continue to pay attention to whether the deletion operation is completed. > > > If the meta-information update has been successfully completed but an > > error > > > occurs during the asynchronous deletion, the ledger may not be deleted, > > but > > > at the logical level we think that the deletion has been completed, which > > > will make this part of the data remain in the storage layer forever (such > > > as bk). As the usage time of the cluster becomes longer, the residual > > data > > > that cannot be deleted will gradually increase. > > > > > > In order to achieve this goal, we can separate the logic of > > > meta-information update and ledger deletion. In the trimming process, we > > > can first mark which ledgers are deletable, and update the results to the > > > metadatastore. We can perform the deletion of marked ledgers > > asynchronously > > > in the callback of updating the meta information, so that the original > > > logic can be retained seamlessly. Therefore, when we are rolling upgrade > > or > > > rollback, the only difference is whether the deleted ledger is marked for > > > deletion. > > > > > > To be more specific: > > > 1. for upgrade, only the marker information of ledger has been added, and > > > the logical sequence of deletion has not changed. > > > 2. for rollback, some ledgers that have been marked for deletion may not > > be > > > deleted due to the restart of the broker. This behavior is consistent > > with > > > the original version. > > > > > > In addition, if the ledger that has been marked is not deleted > > > successfully, the marker will not be removed. So for this part of > > ledgers, > > > every time trimming is triggered, it will be deleted again, which is > > > equivalent to a check and retry mechanism. > > > > > > ## Goal > > > > > > We need to modify some logic in > > > > > `org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#internalTrimLedgers` > > > so that the ledger deletion logic in ledger-trimming is split into two > > > stages, marking and deleting. Once the marker information is updated to > > the > > > metadatastore, every trimming will try to trigger the ledger deletion > > until > > > all the deleteable ledgers are successfully deleted. > > > > > > ## Implementation > > > > > > This proposal aims to separate the deletion logic in ledger-trimming, so > > > that `ManagedLedgerImpl#internalTrimLedgers` is responsible for marking > > the > > > deletable ledgers and then perform actual ledger deletion according to > > the > > > metadatastore. > > > > > > Therefore, the entire trimming process is broken down into the following > > > steps: > > > > > > 1. mark deletable ledgers and update ledger metadata. > > > 2. do acutual ledger deletion after metadata is updated. > > > > > > For step 1, we can store the marker of deletable information in > > > `org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#propertiesMap`. > > When > > > retrieving the deleted ledger information, we can directly query by > > > iterating `propertiesMap`. If this solution is not accepted, maybe we can > > > create a new znode to store these information, but this approach will not > > > be able to reuse the current design. > > > > > > For step 2, we can perform the deletion of marked ledgers asynchronously > > in > > > the callback of updating the meta information. And every trimming will > > > trigger the check and delete for those deleteable ledgers. > > > > > > Related PR: https://github.com/apache/pulsar/pull/13575 > > >
Re: [VOTE] PIP-129: Introduce intermediate state for ledger deletion
+1 (non-binding) Haiting Jiang 于2022年1月14日周五 16:12写道: > +1 (non) > > On 2022/01/14 03:23:37 mattison chao wrote: > > +1 (non-binding) > > > > Best, > > Mattison > > > > On Fri, 14 Jan 2022 at 11:19, Hang Chen wrote: > > > > > +1 (binding) > > > > > > Best, > > > Hang > > > > > > Zhanpeng Wu 于2022年1月14日周五 10:37写道: > > > > > > > > This is the voting thread for PIP-129. It will stay open for at > least 48 > > > > hours. Pasted below for quoting convenience. > > > > > > > > > > > > > > > > https://github.com/apache/pulsar/issues/13526 > > > > > > > > > > > > > > > > ## Motivation > > > > > > > > Under the current ledger-trimming design in > > > > > > > > `org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#internalTrimLedgers`, > > > > we need to collect those ledgers that need to be deleted first, and > then > > > > perform the asynchronous deletion of the ledger concurrently, but we > do > > > not > > > > continue to pay attention to whether the deletion operation is > completed. > > > > If the meta-information update has been successfully completed but an > > > error > > > > occurs during the asynchronous deletion, the ledger may not be > deleted, > > > but > > > > at the logical level we think that the deletion has been completed, > which > > > > will make this part of the data remain in the storage layer forever > (such > > > > as bk). As the usage time of the cluster becomes longer, the residual > > > data > > > > that cannot be deleted will gradually increase. > > > > > > > > In order to achieve this goal, we can separate the logic of > > > > meta-information update and ledger deletion. In the trimming > process, we > > > > can first mark which ledgers are deletable, and update the results > to the > > > > metadatastore. We can perform the deletion of marked ledgers > > > asynchronously > > > > in the callback of updating the meta information, so that the > original > > > > logic can be retained seamlessly. Therefore, when we are rolling > upgrade > > > or > > > > rollback, the only difference is whether the deleted ledger is > marked for > > > > deletion. > > > > > > > > To be more specific: > > > > 1. for upgrade, only the marker information of ledger has been > added, and > > > > the logical sequence of deletion has not changed. > > > > 2. for rollback, some ledgers that have been marked for deletion may > not > > > be > > > > deleted due to the restart of the broker. This behavior is consistent > > > with > > > > the original version. > > > > > > > > In addition, if the ledger that has been marked is not deleted > > > > successfully, the marker will not be removed. So for this part of > > > ledgers, > > > > every time trimming is triggered, it will be deleted again, which is > > > > equivalent to a check and retry mechanism. > > > > > > > > ## Goal > > > > > > > > We need to modify some logic in > > > > > > > > `org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#internalTrimLedgers` > > > > so that the ledger deletion logic in ledger-trimming is split into > two > > > > stages, marking and deleting. Once the marker information is updated > to > > > the > > > > metadatastore, every trimming will try to trigger the ledger deletion > > > until > > > > all the deleteable ledgers are successfully deleted. > > > > > > > > ## Implementation > > > > > > > > This proposal aims to separate the deletion logic in > ledger-trimming, so > > > > that `ManagedLedgerImpl#internalTrimLedgers` is responsible for > marking > > > the > > > > deletable ledgers and then perform actual ledger deletion according > to > > > the > > > > metadatastore. > > > > > > > > Therefore, the entire trimming process is broken down into the > following > > > > steps: > > > > > > > > 1. mark deletable ledgers and update ledger metadata. > > > > 2. do acutual ledger deletion after metadata is updated. > > > > > > > > For step 1, we can store the marker of deletable information in > > > > `org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#propertiesMap`. > > > When > > > > retrieving the deleted ledger information, we can directly query by > > > > iterating `propertiesMap`. If this solution is not accepted, maybe > we can > > > > create a new znode to store these information, but this approach > will not > > > > be able to reuse the current design. > > > > > > > > For step 2, we can perform the deletion of marked ledgers > asynchronously > > > in > > > > the callback of updating the meta information. And every trimming > will > > > > trigger the check and delete for those deleteable ledgers. > > > > > > > > Related PR: https://github.com/apache/pulsar/pull/13575 > > > > > >
Re: [VOTE] PIP-129: Introduce intermediate state for ledger deletion
+1 (binding) On Fri, Jan 14, 2022 at 4:32 PM Aloys Zhang wrote: > +1 (non-binding) > > Haiting Jiang 于2022年1月14日周五 16:12写道: > > > +1 (non) > > > > On 2022/01/14 03:23:37 mattison chao wrote: > > > +1 (non-binding) > > > > > > Best, > > > Mattison > > > > > > On Fri, 14 Jan 2022 at 11:19, Hang Chen wrote: > > > > > > > +1 (binding) > > > > > > > > Best, > > > > Hang > > > > > > > > Zhanpeng Wu 于2022年1月14日周五 10:37写道: > > > > > > > > > > This is the voting thread for PIP-129. It will stay open for at > > least 48 > > > > > hours. Pasted below for quoting convenience. > > > > > > > > > > > > > > > > > > > > https://github.com/apache/pulsar/issues/13526 > > > > > > > > > > > > > > > > > > > > ## Motivation > > > > > > > > > > Under the current ledger-trimming design in > > > > > > > > > > > > `org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#internalTrimLedgers`, > > > > > we need to collect those ledgers that need to be deleted first, and > > then > > > > > perform the asynchronous deletion of the ledger concurrently, but > we > > do > > > > not > > > > > continue to pay attention to whether the deletion operation is > > completed. > > > > > If the meta-information update has been successfully completed but > an > > > > error > > > > > occurs during the asynchronous deletion, the ledger may not be > > deleted, > > > > but > > > > > at the logical level we think that the deletion has been completed, > > which > > > > > will make this part of the data remain in the storage layer forever > > (such > > > > > as bk). As the usage time of the cluster becomes longer, the > residual > > > > data > > > > > that cannot be deleted will gradually increase. > > > > > > > > > > In order to achieve this goal, we can separate the logic of > > > > > meta-information update and ledger deletion. In the trimming > > process, we > > > > > can first mark which ledgers are deletable, and update the results > > to the > > > > > metadatastore. We can perform the deletion of marked ledgers > > > > asynchronously > > > > > in the callback of updating the meta information, so that the > > original > > > > > logic can be retained seamlessly. Therefore, when we are rolling > > upgrade > > > > or > > > > > rollback, the only difference is whether the deleted ledger is > > marked for > > > > > deletion. > > > > > > > > > > To be more specific: > > > > > 1. for upgrade, only the marker information of ledger has been > > added, and > > > > > the logical sequence of deletion has not changed. > > > > > 2. for rollback, some ledgers that have been marked for deletion > may > > not > > > > be > > > > > deleted due to the restart of the broker. This behavior is > consistent > > > > with > > > > > the original version. > > > > > > > > > > In addition, if the ledger that has been marked is not deleted > > > > > successfully, the marker will not be removed. So for this part of > > > > ledgers, > > > > > every time trimming is triggered, it will be deleted again, which > is > > > > > equivalent to a check and retry mechanism. > > > > > > > > > > ## Goal > > > > > > > > > > We need to modify some logic in > > > > > > > > > > > > `org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#internalTrimLedgers` > > > > > so that the ledger deletion logic in ledger-trimming is split into > > two > > > > > stages, marking and deleting. Once the marker information is > updated > > to > > > > the > > > > > metadatastore, every trimming will try to trigger the ledger > deletion > > > > until > > > > > all the deleteable ledgers are successfully deleted. > > > > > > > > > > ## Implementation > > > > > > > > > > This proposal aims to separate the deletion logic in > > ledger-trimming, so > > > > > that `ManagedLedgerImpl#internalTrimLedgers` is responsible for > > marking > > > > the > > > > > deletable ledgers and then perform actual ledger deletion according > > to > > > > the > > > > > metadatastore. > > > > > > > > > > Therefore, the entire trimming process is broken down into the > > following > > > > > steps: > > > > > > > > > > 1. mark deletable ledgers and update ledger metadata. > > > > > 2. do acutual ledger deletion after metadata is updated. > > > > > > > > > > For step 1, we can store the marker of deletable information in > > > > > > `org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#propertiesMap`. > > > > When > > > > > retrieving the deleted ledger information, we can directly query by > > > > > iterating `propertiesMap`. If this solution is not accepted, maybe > > we can > > > > > create a new znode to store these information, but this approach > > will not > > > > > be able to reuse the current design. > > > > > > > > > > For step 2, we can perform the deletion of marked ledgers > > asynchronously > > > > in > > > > > the callback of updating the meta information. And every trimming > > will > > > > > trigger the check and delete for those deleteable ledgers. > > > > > > > > > > Related PR: https://github.com/apache/pulsar/pul
[VOTE] PIP-135: Include MetadataStore backend for Etcd
https://github.com/apache/pulsar/issues/13717 - ## Motivation Since all the pieces that composed the proposal in PIP-45 were finally merged and are currently ready for 2.10 release, it is now possible to add other metadata backends that can be used to support a BookKeeper + Pulsar cluster. One of the popular systems that is most commonly used as an alternative of ZooKeeper is Etcd, thus it makes sense to have this as the first non-zookeeper implementation. ## Goal Provide an Etcd implementation for the `MetadataStore` API. This will allow users to deploy Pulsar clusters using Etcd service for the metadata and it will not require the presence of ZooKeeper. ## Implementation * Use the existing JEtcd Java client library for Etcd * Extends the `AbstractBatchedMetadataStore` class, in order to reuse the transparent batching logic that will be shared with the ZooKeeper implementation. Work in progress: https://github.com/apache/pulsar/pull/13225 -- Matteo Merli
How to improve the official website Chinese documentation (https://pulsar.apache.org/docs/zh-CN)
When I was browsing the official Chinese documentation ( https://pulsar.apache.org/docs/zh-CN/next/concepts-messaging/), I found that some of the content was not translated into Chinese, but was still in English, so I would like to ask if I want to help it What is the translation, method or approach? grateful For example: 延时消息功能允许你能够过一段时间才能消费到这条消息,而不是消息发布后,就马上可以消费到。 In this mechanism, a message is stored in BookKeeper, DelayedDeliveryTracker maintains the time index(time -> messageId) in memory after published to a broker, and it is delivered to a consumer once the specific delayed time is passed.
Re: [VOTE] PIP-135: Include MetadataStore backend for Etcd
+1 (binding) Thanks Enrico Il Ven 14 Gen 2022, 23:52 Matteo Merli ha scritto: > https://github.com/apache/pulsar/issues/13717 > > - > > ## Motivation > > Since all the pieces that composed the proposal in PIP-45 were finally > merged > and are currently ready for 2.10 release, it is now possible to add other > metadata backends that can be used to support a BookKeeper + Pulsar > cluster. > > One of the popular systems that is most commonly used as an alternative of > ZooKeeper is Etcd, thus it makes sense to have this as the first > non-zookeeper > implementation. > > ## Goal > > Provide an Etcd implementation for the `MetadataStore` API. This will allow > users to deploy Pulsar clusters using Etcd service for the metadata and it > will > not require the presence of ZooKeeper. > > > ## Implementation > > * Use the existing JEtcd Java client library for Etcd > * Extends the `AbstractBatchedMetadataStore` class, in order to reuse the >transparent batching logic that will be shared with the ZooKeeper >implementation. > > Work in progress: https://github.com/apache/pulsar/pull/13225 > > -- > Matteo Merli > >
Re: [DISCUSS] PIP-124: Pulsar Client Shared State API
@matteo ping Enrico Il Mer 29 Dic 2021, 08:35 Enrico Olivelli ha scritto: > Matteo, > > Il Mer 29 Dic 2021, 02:57 Matteo Merli ha > scritto: > >> > * Add an API to the Java client that makes it easier to maintain a >> consistent Share State between instances of an application. >> > * Provide some ready to use recipes, like a simple key-value store >> > >> > It is not a goal to implement a Pulsar backed Database system >> >> While the first use case for Pulsar was indeed to be the >> messaging/replication platform for a distributed database, and it has >> been working in production for many years, I'm not convinced to add >> this level of API as part of the Pulsar client API. >> >> Pulsar API has been designed to be high-level and easy to use (and >> reason about), with in mind the use cases of application developers. I >> don't think that a "storage" level API fits well with the rest of >> abstractions. >> >> > public interface PulsarMap extends AutoCloseable { >> > .. >> > CompletableFuture put(K key, V value) >> >> If all the logic is implemented in the client side, when there are >> multiple clients sharing the same, how can any of them mutate the >> state, since we actually enforce that there is a single exclusive >> producer? Would a user get an error if there's already a different >> client writing? >> >> My impression is that, while looking convenient, a shared Map >> interface is not the best abstraction for either case: >> * If you're actually building a DB, you will definitely need access >> to the log itself rather than a Map interface >> * If you want to share some state across multiple processes without >> using a DB, there are many tricky API, consistency and semantic >> problems to solve, many of which are just pushed down to the >> application which will need to be aware and understand them. At that >> point, I would seriously recommend using a DB, or if the question is: >> "I don't want to use an additional external system", then to use the >> BK TableService component. >> > > This is usually not a option because the BK TableService does not support > well multi tenancy and also the application would need to connect to the > Bookies (think about configuration, security...) > > >> >> I think this feature should be best viewed as a recipe, as it doesn't >> depend on or benefits from any internal broker support. If there are >> enough interest and concrete use cases it can be then included later. >> > > My initial proposal was to push this to Pulsar Adapters. > I changed the proposal before sending the PIP because I think it very > useful for Protocol Handlers and in Pulsar IO connectors. > > I am totally fine to add this to pulsar-adapters, but I want to see this > in the Pulsar repo and released as part of an official Pulsar recipe. > > @Matteo does this sound like a good option to you? > > Otherwise we miss the possibility to make it easier for Pulsar users to > leverage this power of Pulsar. > > In Pravega you have State Synchronizers and they are a great foundational > API and we are missing something like that in Pulsar. > > Enrico > > > >> -- >> Matteo Merli >> >> >> On Fri, Dec 24, 2021 at 1:53 AM Enrico Olivelli >> wrote: >> > >> > Hello everyone, >> > I want to start a discussion about PIP-124 Pulsar Client Shared State >> API >> > >> > This is the PIP document >> > https://github.com/apache/pulsar/issues/13490 >> > >> > This is a demo implementation (a proof-of-concept): >> > https://github.com/eolivelli/pulsar-shared-state-manager >> > >> > Please take a look and share your thoughts >> > >> > I believe that this will unlock the potential of the Exclusive >> > Producer and it will also make easier the life of many developers who >> > are using Pulsar and need some API to share configuration, metadata, >> > or any simple key-value data structure without adding a Database or >> > other components to their library, Pulsar IO connector or Pulsar >> > Protocol Handler. >> > >> > Thanks >> > Enrico >> >