Hi Penghui, After test, we can use #12136 to stop the replicator.
Regards Jiwei Guo (Tboy) On Sat, Mar 5, 2022 at 5:31 PM PengHui Li <peng...@apache.org> wrote: > > Great point. I was focused on deleting namespaces and missed the case > where the user wants to delete a topic from a replicated namespace. I > agree that we should make it possible to delete these topics without > removing the namespace replication. > > Oh, sorry. I thought again, after > https://github.com/apache/pulsar/pull/12136 > introduced topic level geo-replication configuration, users can disable for > a topic even if the namespace enabled geo-replication. > > And topic level policy is introduced in 2.6.0. I think It's time to make it > one of > the features turned on by default. > > So, we can follow the steps that stop the replication first and then delete > the > topic from clusters. > > > Note that it is already possible > to delete these topics by deleting with force or by configuring > inactive topic deletion. > > Hmm, I think both of them do not work for now, because the producer of > the replicator will always be active. > > > Considering users can delete replicated topics by deleting with force, > are you saying we need to provide a non-force way to delete these > topics? > > Yes, if the topic doesn't have users' producers/consumers, users don't need > to force delete the topic, it's a normal deletion. > > > ---------------------------------------------------------------------------------- > @mattison @jiwei could you please check if we can use > https://github.com/apache/pulsar/pull/12136 > to stop the geo-replication first, and then delete the topic? > If it works, we don't need to change the protocol. > > Thanks, > Penghui > > > > On Thu, Mar 3, 2022 at 7:12 AM Michael Marshall <mmarsh...@apache.org> > wrote: > > > > The geo-replication's configuration can be centralized by using > > > one configuration store. But that doesn't change anything, we should > > > provide the same behavior for both centralized and decentralized > > > configuration store. > > > > Perhaps I misused the word decentralized. I meant that the challenge > > comes because geo-replication configurations for clusters are > > independent from each other. > > > > > The current challenge is > > > users usually set replicated clusters for a namespace, if remove > > > the replication configuration, the entire namespace will be affected. > > > We have supported setting the replicated configuration for a topic [1], > > > only for 2.10.0 or later. > > > > Great point. I was focused on deleting namespaces and missed the case > > where the user wants to delete a topic from a replicated namespace. I > > agree that we should make it possible to delete these topics without > > removing the namespace replication. Note that it is already possible > > to delete these topics by deleting with force or by configuring > > inactive topic deletion. > > > > > we should provided steps for how to delete an unused replicated > > > topic and what are the effects of not removing properly > > > > I agree, and I think documentation coupled with helpful error logs > > will be very helpful here. > > > > Considering users can delete replicated topics by deleting with force, > > are you saying we need to provide a non-force way to delete these > > topics? > > > > If we want to provide a path to normal deletion, we already have an > > `isRemote` flag on producers, and the inactive topic deletion code > > uses this flag to determine if there are non-replication producers > > connected to a topic. We could modify the deletion logic for a global > > topic so that it can be deleted as long as the only producers > > connected are remote producers. My main concern is that normal > > deletion could allow users to miss the nuance that they must also > > delete the topic in the other cluster. This might be an overly > > cautious concern, though. > > > > Thanks, > > Michael > > > > > > > > On Tue, Mar 1, 2022 at 1:55 AM PengHui Li <peng...@apache.org> wrote: > > > > > > > I agree with you that it'd be nice to provide the same deletion > > > behavior. However, because geo-replication's configuration is > > > decentralized, I think namespace or topic deletion is more complicated > > > than unreplicated deletion. Note that users cannot currently delete > > > namespaces that are configured with remote replication clusters. > > > > > > The geo-replication's configuration can be centralized by using > > > one configuration store. But that doesn't change anything, we should > > > provide the same behavior for both centralized and decentralized > > > configuration store. > > > > > > > Note that while a user cannot explicitly delete a replicated topic, > > > they can remove the replication configuration in cluster A and cluster > > > B, and then they are left with unreplicated namespaces and topics, > > > which can be deleted. > > > > > > Yes, stopping the replication first is a more elegant approach, > > > and it is also very easy to understand for users. The current challenge > > is > > > users usually set replicated clusters for a namespace, if remove > > > the replication configuration, the entire namespace will be affected. > > > We have supported setting the replicated configuration for a topic [1], > > > only for 2.10.0 or later. > > > > > > > In this case, the specific problem stems from a remote cluster > > > targeting the local cluster. If the user only deletes the local topic, > > > without removing the replication configuration, I don't think it > > > should be surprising when the topic gets recreated (assuming auto > > > topic creation is enabled). > > > > > > From the user's perspective, they don't have any producers/consumers, > > > If the topic only has the producer of the replicator connect to the > > remote > > > topic > > > each other, in this case, I think we should allow users to delete the > > > topic. > > > Users will be surprised in this case, because they are not able to > delete > > > an unused topic, they are not able to change the replication > > configuration > > > for > > > a namespace. > > > > > > > In my view, I think it's okay to make it harder to delete > > > geo-replicated namespaces or topics than local namespaces or topics. > > > Failing requests with a good, informative error message protects > > > users. Additionally, if users are able to delete a namespace or topic > > > by force, that gives them a work around--although we should document > > > the limitations of this workaround. > > > > > > I agree with it's okay to make it harder to delete a replicated topic, > so > > > for > > > a replicated topic, users should delete the topic in each cluster, we > > should > > > provided steps for how to delete an unused replicated topic and what > are > > > the effects of not removing properly, for example, if a topic has 2 > > > clusters, cluster-a > > > and cluster-b, but only delete the topic from cluster-a, the topic will > > be > > > created > > > again if cluster-b write some new data into this topic, depends on the > > topic > > > auto-creation. > > > > > > If we only support set replicated cluster for a topic, not a namespace, > > > I like the approach that stops the replication first and then deletes > the > > > topic. > > > > > > [1] https://github.com/apache/pulsar/pull/12136 > > > > > > Thanks, > > > Penghui > > > > > > On Tue, Mar 1, 2022 at 2:27 PM Michael Marshall <mmarsh...@apache.org> > > > wrote: > > > > > > > > Either replicated topic or non-replicated topic, we should provide > > the > > > > > same topic deletion behavior, the topic can be deleted if no active > > > > > user's producers/consumers, should not be affected by the Pulsar > > > > > internal producers/consumers. > > > > > > > > I agree with you that it'd be nice to provide the same deletion > > > > behavior. However, because geo-replication's configuration is > > > > decentralized, I think namespace or topic deletion is more > complicated > > > > than unreplicated deletion. Note that users cannot currently delete > > > > namespaces that are configured with remote replication clusters. > > > > > > > > One challenge I see is the configuration in the remote cluster. If > > > > cluster A is replicated to cluster B for namespace C, deleting a > topic > > > > in cluster B does not affect the configuration in cluster A. If we > > > > make it easy to delete the topic in cluster B, we need to decide how > > > > to handle failed replication in cluster A. In the same way, deleting > > > > namespace C in cluster B doesn't delete it from cluster A. For > > > > example, when should cluster A stop attempting to create producers to > > > > cluster B for the deleted namespace/topic and how do we inform > > > > operators that the replication is no longer working as configured. We > > > > avoid several problems by failing the initial user request. > > > > > > > > Note that while a user cannot explicitly delete a replicated topic, > > > > they can remove the replication configuration in cluster A and > cluster > > > > B, and then they are left with unreplicated namespaces and topics, > > > > which can be deleted. > > > > > > > > > But for now, for a replicated topic, users are not able to delete > it > > > > because > > > > > of the internal active producer, adding force delete option can > > delete > > > > > the topic, but it will create again if users enabled topic > > auto-creation. > > > > > > > > In this case, the specific problem stems from a remote cluster > > > > targeting the local cluster. If the user only deletes the local > topic, > > > > without removing the replication configuration, I don't think it > > > > should be surprising when the topic gets recreated (assuming auto > > > > topic creation is enabled). > > > > > > > > In my view, I think it's okay to make it harder to delete > > > > geo-replicated namespaces or topics than local namespaces or topics. > > > > Failing requests with a good, informative error message protects > > > > users. Additionally, if users are able to delete a namespace or topic > > > > by force, that gives them a work around--although we should document > > > > the limitations of this workaround. > > > > > > > > I am not experienced with geo-replication, so please let me know if > > > > any of my analysis doesn't align with the actual design. > > > > > > > > Thanks, > > > > Michael > > > > > > > > > > > > On Sun, Feb 27, 2022 at 8:42 PM guo jiwei <techno...@apache.org> > > wrote: > > > > > > > > > > I have added the geo-replicator topic case and updated the PIP. > > > > > > > > > > > > > > > Regards > > > > > Jiwei Guo (Tboy) > > > > > > > > > > > > > > > On Sun, Feb 27, 2022 at 1:00 PM PengHui Li <peng...@apache.org> > > wrote: > > > > > > > > > > > > To me, the main question is whether we create a custom error or > > > > expand > > > > > > the `CommandCloseProducer` command. I lean towards adding an > error > > > > > > because it will automatically be backward compatible based on the > > way > > > > > > the client determines which errors are retriable. Although, I > don't > > > > > > have a strong opinion. > > > > > > > > > > > > I have the same opinion as you. But it looks like not able to fix > > the > > > > > > geo-replication topic deletion. > > > > > > > > > > > > Either replicated topic or non-replicated topic, we should > provide > > the > > > > > > same topic deletion behavior, the topic can be deleted if no > active > > > > > > user's producers/consumers, should not be affected by the Pulsar > > > > > > internal producers/consumers. > > > > > > > > > > > > But for now, for a replicated topic, users are not able to delete > > it > > > > > > because > > > > > > of the internal active producer, adding force delete option can > > delete > > > > > > the topic, but it will create again if users enabled topic > > > > auto-creation. > > > > > > > > > > > > I think in this case, it seems we don't have a chance to give the > > > > producer > > > > > > an exception, because the reconnection will connect to a new > topic > > only > > > > > > with > > > > > > the same topic name as before. > > > > > > > > > > > > Thanks, > > > > > > Penghui > > > > > > > > > > > > On Sat, Feb 26, 2022 at 2:24 PM Michael Marshall < > > mmarsh...@apache.org > > > > > > > > > > > wrote: > > > > > > > > > > > > > > We need to determine the overall plan, not the implementation > > > > > > > > at this moment. > > > > > > > > > > > > > > Great point, I agree. > > > > > > > > > > > > > > > Looks like we need an option in the > > > > > > > > `CommandCloseProducer` to avoid the replication producer > > > > > > > > reconnecting again. > > > > > > > > > > > > > > This is an important point. One of the primary requirements for > > > > > > > geo-replication must also be that a replicated namespace or > > topic can > > > > > > > be deleted and recreated without any problems or leaks in the > > broker. > > > > > > > That requires that the producer is cleaned up properly when the > > > > > > > namespace is deleted. I think we could achieve that with a new > > server > > > > > > > error or an expansion of the `CommandCloseProducer` command. > > > > > > > > > > > > > > To me, the main question is whether we create a custom error or > > > > expand > > > > > > > the `CommandCloseProducer` command. I lean towards adding an > > error > > > > > > > because it will automatically be backward compatible based on > > the way > > > > > > > the client determines which errors are retriable. Although, I > > don't > > > > > > > have a strong opinion. > > > > > > > > > > > > > > The key code paths that we need to consider are topic lookup > and > > > > topic > > > > > > > creation. Both the http admin endpoint and the binary protocol > > > > > > > endpoint interact with these code paths, and we should fail > > requests > > > > > > > for namespaces which have the "deleted" field set to true. > > > > > > > > > > > > > > In looking at the code, we could modify the logic in > > > > > > > `PulsarWebResource#checkLocalOrGetPeerReplicationCluster`. That > > > > method > > > > > > > is always called for topic lookups via HTTP or via the Pulsar > > > > > > > binary protocol. If a namespace doesn't exist, that method > > currently > > > > > > > returns a NOT_FOUND exception. In the binary protocol case, > that > > > > > > > exception is converted to a `ServerError.MetadataError`. We > could > > > > > > > modify the method like I do here [0], or we could add another > > case > > > > and > > > > > > > return a custom error message. > > > > > > > > > > > > > > We'd also need to prevent topic creation via the admin endpoint > > and > > > > > > > the auto topic creation via the binary protocol. I believe this > > will > > > > > > > be just as easy as the lookup case, but I haven't confirmed. > > > > > > > > > > > > > > Thanks, > > > > > > > Michael > > > > > > > > > > > > > > [0] > > > > > > > > > > > > > > > > > > > > https://github.com/michaeljmarshall/pulsar/commit/93611a741af163587f79b88bc1c9f1acc7953512 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Feb 25, 2022 at 3:45 AM PengHui Li <peng...@apache.org > > > > > > wrote: > > > > > > > > > > > > > > > > The replicated topic deletion also has a similar problem. > > > > > > > > If the topic auto-creation is enabled, we are not able to > > delete > > > > > > > > a replicated topic, because the producer used for replication > > > > > > > > will try to reconnect. Looks like we need an option in the > > > > > > > > `CommandCloseProducer` to avoid the replication producer > > > > > > > > reconnecting again. > > > > > > > > > > > > > > > > Penghui > > > > > > > > > > > > > > > > On Fri, Feb 25, 2022 at 5:26 PM PengHui Li < > peng...@apache.org > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Penghui, are you suggesting that we implement the > > > > namespace/tenant > > > > > > > > > terminated logic after completing this PIP? > > > > > > > > > > > > > > > > > > I'm ok with both being implemented together or separated. > > > > > > > > > We need to determine the overall plan, not the > implementation > > > > > > > > > at this moment. > > > > > > > > > > > > > > > > > > > For the sake of discussion, if we implement the namespace > > > > > > terminated > > > > > > > > > logic first, we could fulfill the underlying requirements > for > > > > this > > > > > > PIP > > > > > > > > > by returning a new non-retriable error response when a > client > > > > tries > > > > > > to > > > > > > > > > connect a producer or a consumer to a topic in a namespace > > that > > > > is > > > > > > > > > "terminated". If we do add the "namespace terminated" > > feature, > > > > we'll > > > > > > > > > need to add a non-retriable exception for this case, > anyway. > > The > > > > main > > > > > > > > > advantage here is that we'd only need one expansion of the > > > > protobuf > > > > > > > > > instead of two. The downside is that the protocol for > > connected > > > > > > > > > clients has a couple more roundtrips. The broker would > > disconnect > > > > > > > > > connected clients and then fail their reconnection attempt > > with a > > > > > > > > > non-retriable error. > > > > > > > > > > > > > > > > > > Regarding the namespace "terminated" concept, I just > noticed > > > > that we > > > > > > > > > already have a "deleted" field in a namespace's policies > [0]. > > > > There > > > > > > is > > > > > > > > > even a comment that says: > > > > > > > > > > > > > > > > > > Yes, if we can terminate the namespace first, we should > > return > > > > > > > “namespace > > > > > > > > > not found” > > > > > > > > > if the producers and consumers try to reconnect, the client > > > > should > > > > > > stop > > > > > > > > > reconnecting > > > > > > > > > after getting this error. Looks like following this way, we > > don't > > > > > > need > > > > > > > to > > > > > > > > > introduce any protocol changes anymore. > > > > > > > > > > > > > > > > > > > As I think about it more, I no longer think "terminated" > > is the > > > > > > right > > > > > > > > > term for what I proposed above. Our goal is to briefly > > prevent > > > > any > > > > > > > > > topic creation to ensure we can delete all sub resources > for > > a > > > > > > > > > namespace. On the other hand, a terminated topic isn't > > > > necessarily > > > > > > > > > short lived. If we want to apply the "terminated" term > > > > unequivocally > > > > > > > > > to both topics and namespaces, I think a terminated > namespace > > > > would > > > > > > > > > need to be a namespace where all topics are in terminated > > state > > > > and > > > > > > no > > > > > > > > > additional topics could be created. That's not the feature > > we're > > > > > > > > > discussing here, though. Deleted seems like the right term > > to me, > > > > > > > > > especially since we're already using it to prevent a race > > > > condition > > > > > > > > > > > > > > > > > > Yes, topic termination has different meanings, deletion > > should > > > > be the > > > > > > > right > > > > > > > > > term here. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Penghui > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Feb 25, 2022 at 12:19 PM Michael Marshall < > > > > > > > mmarsh...@apache.org> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > >> Regarding the namespace "terminated" concept, I just > noticed > > > > that we > > > > > > > > >> already have a "deleted" field in a namespace's policies > > [0]. > > > > There > > > > > > is > > > > > > > > >> even a comment that says: > > > > > > > > >> > > > > > > > > >> > // set the policies to deleted so that somebody else > > cannot > > > > > > acquire > > > > > > > > >> this namespace > > > > > > > > >> > > > > > > > > >> I am not familiar with this feature, but it seems like > this > > > > policy > > > > > > > > >> field could be checked before creating a topic in a > > namespace. > > > > That > > > > > > > > >> would remove certain races described above. > > > > > > > > >> > > > > > > > > >> As I think about it more, I no longer think "terminated" > is > > the > > > > > > right > > > > > > > > >> term for what I proposed above. Our goal is to briefly > > prevent > > > > any > > > > > > > > >> topic creation to ensure we can delete all sub resources > > for a > > > > > > > > >> namespace. On the other hand, a terminated topic isn't > > > > necessarily > > > > > > > > >> short lived. If we want to apply the "terminated" term > > > > unequivocally > > > > > > > > >> to both topics and namespaces, I think a terminated > > namespace > > > > would > > > > > > > > >> need to be a namespace where all topics are in terminated > > state > > > > and > > > > > > no > > > > > > > > >> additional topics could be created. That's not the feature > > we're > > > > > > > > >> discussing here, though. Deleted seems like the right term > > to > > > > me, > > > > > > > > >> especially since we're already using it to prevent a race > > > > condition > > > > > > > > >> [0]. > > > > > > > > >> > > > > > > > > >> Thanks, > > > > > > > > >> Michael > > > > > > > > >> > > > > > > > > >> [0] > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/NamespacesBase.java#L252-L262 > > > > > > > > >> > > > > > > > > >> On Thu, Feb 24, 2022 at 9:58 PM Michael Marshall < > > > > > > > mmarsh...@apache.org> > > > > > > > > >> wrote: > > > > > > > > >> > > > > > > > > > >> > Hi Dave, > > > > > > > > >> > > > > > > > > > >> > > automatically delete tenants and namespaces for not > > > > containing > > > > > > > topics > > > > > > > > >> > > > > > > > > > >> > I don't think that is what we are discussing. I agree > > that the > > > > > > > initial > > > > > > > > >> > email says just that, though, which is why I asked > above: > > > > > > > > >> > > > > > > > > > >> > >> When there are no user-created topics under a > > namespace, > > > > > > > > >> > >> Namespace should be deleted. > > > > > > > > >> > > This is supposed to mean that the namespace should be > > able > > > > to be > > > > > > > > >> > > deleted, correct? > > > > > > > > >> > > > > > > > > > >> > Perhaps Mattison can clarify, too. > > > > > > > > >> > > > > > > > > > >> > My current understanding of the context for the PIP is > > that a > > > > user > > > > > > > > >> > call to delete a namespace without force can fail when a > > > > producer > > > > > > > > >> > reconnects to a deleted topic. The goal is to remove the > > race > > > > > > > > >> > condition to ensure namespace deletion can succeed. > > > > > > > > >> > > > > > > > > > >> > Thanks, > > > > > > > > >> > Michael > > > > > > > > >> > > > > > > > > > >> > On Thu, Feb 24, 2022 at 5:21 PM Dave Fisher < > > w...@apache.org> > > > > > > > wrote: > > > > > > > > >> > > > > > > > > > > >> > > Hi - > > > > > > > > >> > > > > > > > > > > >> > > I hope I’m understanding what’s being discussed. > > > > > > > > >> > > > > > > > > > > >> > > If we are going to automatically delete tenants and > > > > namespaces > > > > > > for > > > > > > > > >> not containing topics then we need to make both of these > > > > automatic > > > > > > > actions > > > > > > > > >> configurable with a default to NOT do so. Otherwise we > break > > > > > > existing > > > > > > > use > > > > > > > > >> cases. > > > > > > > > >> > > > > > > > > > > >> > > Automatic deletion of namespaces should be > configurable > > at > > > > both > > > > > > > the > > > > > > > > >> cluster and tenant level. > > > > > > > > >> > > > > > > > > > > >> > > Regards, > > > > > > > > >> > > Dave > > > > > > > > >> > > > > > > > > > > >> > > > On Feb 24, 2022, at 2:25 PM, Michael Marshall < > > > > > > > mmarsh...@apache.org> > > > > > > > > >> wrote: > > > > > > > > >> > > > > > > > > > > > >> > > >> The old producer/consumer should be closed after > > > > applying the > > > > > > > > >> changes from > > > > > > > > >> > > >> this proposal. > > > > > > > > >> > > > > > > > > > > > >> > > > Penghui, are you suggesting that we implement the > > > > > > > namespace/tenant > > > > > > > > >> > > > terminated logic after completing this PIP? > > > > > > > > >> > > > > > > > > > > > >> > > > For the sake of discussion, if we implement the > > namespace > > > > > > > terminated > > > > > > > > >> > > > logic first, we could fulfill the underlying > > requirements > > > > for > > > > > > > this > > > > > > > > >> PIP > > > > > > > > >> > > > by returning a new non-retriable error response > when a > > > > client > > > > > > > tries > > > > > > > > >> to > > > > > > > > >> > > > connect a producer or a consumer to a topic in a > > namespace > > > > > > that > > > > > > > is > > > > > > > > >> > > > "terminated". If we do add the "namespace > terminated" > > > > feature, > > > > > > > we'll > > > > > > > > >> > > > need to add a non-retriable exception for this case, > > > > anyway. > > > > > > The > > > > > > > > >> main > > > > > > > > >> > > > advantage here is that we'd only need one expansion > > of the > > > > > > > protobuf > > > > > > > > >> > > > instead of two. The downside is that the protocol > for > > > > > > connected > > > > > > > > >> > > > clients has a couple more roundtrips. The broker > would > > > > > > > disconnect > > > > > > > > >> > > > connected clients and then fail their reconnection > > attempt > > > > > > with > > > > > > > a > > > > > > > > >> > > > non-retriable error. > > > > > > > > >> > > > > > > > > > > > >> > > > Thanks, > > > > > > > > >> > > > Michael > > > > > > > > >> > > > > > > > > > > > >> > > > On Thu, Feb 24, 2022 at 7:11 AM PengHui Li < > > > > > > peng...@apache.org> > > > > > > > > >> wrote: > > > > > > > > >> > > >> > > > > > > > > >> > > >>> If we want to solve this problem, we need to add > > some > > > > sync > > > > > > > > >> resources like > > > > > > > > >> > > >> lock/state, I think it’s a harm for us, we don’t > > need to > > > > do > > > > > > > that. > > > > > > > > >> > > >> > > > > > > > > >> > > >> I think we can make the namespace/tenants to the > > inactive > > > > > > state > > > > > > > > >> first so > > > > > > > > >> > > >> that we can avoid any new > > > > > > > > >> > > >> producer/consumer connect to the topic under the > > > > > > > namespace/tenant. > > > > > > > > >> > > >> > > > > > > > > >> > > >> The old producer/consumer should be closed after > > > > applying the > > > > > > > > >> changes from > > > > > > > > >> > > >> this proposal. > > > > > > > > >> > > >> > > > > > > > > >> > > >> Thanks, > > > > > > > > >> > > >> Penghui > > > > > > > > >> > > >> > > > > > > > > >> > > >> On Tue, Feb 8, 2022 at 5:47 PM mattison chao < > > > > > > > > >> mattisonc...@gmail.com> wrote: > > > > > > > > >> > > >> > > > > > > > > >> > > >>>> This is supposed to mean that the namespace > should > > be > > > > able > > > > > > > to be > > > > > > > > >> > > >>>> deleted, correct? > > > > > > > > >> > > >>> > > > > > > > > >> > > >>> Yes, the main background is the user doesn’t have > an > > > > active > > > > > > > > >> topic. so, > > > > > > > > >> > > >>> they want to delete the namespace. > > > > > > > > >> > > >>> > > > > > > > > >> > > >>>> However, I think > > > > > > > > >> > > >>>> we might still have a race condition that could > > make > > > > tenant > > > > > > > or > > > > > > > > >> > > >>>> namespace deletion fail. Specifically, if a new > > > > producer or > > > > > > > > >> consumer > > > > > > > > >> > > >>>> creates a topic after the namespace deletion has > > > > started > > > > > > but > > > > > > > > >> > > >>>> before it is complete. Do you agree that the > > underlying > > > > > > race > > > > > > > > >> still > > > > > > > > >> > > >>> exists? > > > > > > > > >> > > >>> > > > > > > > > >> > > >>> Yes, this condition exists. I think it’s not a big > > > > problem > > > > > > > > >> because the > > > > > > > > >> > > >>> user doesn’t want to use this namespace anymore. > > > > > > > > >> > > >>> If this scenario appears, they will get an error > and > > > > need to > > > > > > > > >> delete it > > > > > > > > >> > > >>> again. > > > > > > > > >> > > >>> > > > > > > > > >> > > >>>> What if we expand our usage of the "terminated" > > > > feature to > > > > > > > apply > > > > > > > > >> to > > > > > > > > >> > > >>>> namespaces (and tenants)? Then, a terminated > > namespace > > > > can > > > > > > > have > > > > > > > > >> > > >>>> bundles and topics can be deleted but not created > > > > (just as > > > > > > a > > > > > > > > >> terminated > > > > > > > > >> > > >>>> topic cannot have any new messages published to > > it). > > > > This > > > > > > > would > > > > > > > > >> take > > > > > > > > >> > > >>>> care of all topic creation race conditions. We'd > > > > probably > > > > > > > need > > > > > > > > >> to add > > > > > > > > >> > > >>>> new protobuf exceptions for this feature. > > > > > > > > >> > > >>> > > > > > > > > >> > > >>> > > > > > > > > >> > > >>> If we want to solve this problem, we need to add > > some > > > > sync > > > > > > > > >> resources like > > > > > > > > >> > > >>> lock/state, I think it’s a harm for us, we don’t > > need > > > > to do > > > > > > > that. > > > > > > > > >> > > >>> > > > > > > > > >> > > >>> Thanks for your suggestions, let me know what you > > think. > > > > > > > > >> > > >>> > > > > > > > > >> > > >>> Best, > > > > > > > > >> > > >>> Mattison > > > > > > > > >> > > >>> > > > > > > > > >> > > >>>> On Feb 1, 2022, at 2:26 PM, Michael Marshall < > > > > > > > > >> mmarsh...@apache.org> > > > > > > > > >> > > >>> wrote: > > > > > > > > >> > > >>>> > > > > > > > > >> > > >>>> This proposal identifies an important issue that > we > > > > should > > > > > > > > >> definitely > > > > > > > > >> > > >>>> solve. I have some questions. > > > > > > > > >> > > >>>> > > > > > > > > >> > > >>>>> When there are no user-created topics under a > > > > namespace, > > > > > > > > >> > > >>>>> Namespace should be deleted. > > > > > > > > >> > > >>>> > > > > > > > > >> > > >>>> This is supposed to mean that the namespace > should > > be > > > > able > > > > > > > to be > > > > > > > > >> > > >>>> deleted, correct? > > > > > > > > >> > > >>>> > > > > > > > > >> > > >>>>> For this reason, we need to close the system > topic > > > > > > > > >> reader/producer > > > > > > > > >> > > >>>>> first, then remove the system topic. finally, > > remove > > > > the > > > > > > > > >> namespace. > > > > > > > > >> > > >>>> > > > > > > > > >> > > >>>> I agree that expanding the protobuf CloseProducer > > and > > > > > > > > >> CloseConsumer > > > > > > > > >> > > >>>> commands could be valuable here. The expansion > > would > > > > ensure > > > > > > > that > > > > > > > > >> > > >>>> producers and consumers don't attempt to > reconnect. > > > > > > However, > > > > > > > I > > > > > > > > >> think > > > > > > > > >> > > >>>> we might still have a race condition that could > > make > > > > tenant > > > > > > > or > > > > > > > > >> > > >>>> namespace deletion fail. Specifically, if a new > > > > producer or > > > > > > > > >> consumer > > > > > > > > >> > > >>>> creates a topic after the namespace deletion has > > > > started > > > > > > but > > > > > > > > >> > > >>>> before it is complete. Do you agree that the > > underlying > > > > > > race > > > > > > > > >> still > > > > > > > > >> > > >>> exists? > > > > > > > > >> > > >>>> > > > > > > > > >> > > >>>> In my view, the fundamental problem here is that > > > > deleting > > > > > > > > >> certain Pulsar > > > > > > > > >> > > >>>> resources takes time and, in a distributed > system, > > that > > > > > > means > > > > > > > > >> race > > > > > > > > >> > > >>>> conditions. > > > > > > > > >> > > >>>> > > > > > > > > >> > > >>>> What if we expand our usage of the "terminated" > > > > feature to > > > > > > > apply > > > > > > > > >> to > > > > > > > > >> > > >>>> namespaces (and tenants)? Then, a terminated > > namespace > > > > can > > > > > > > have > > > > > > > > >> > > >>>> bundles and topics can be deleted but not created > > > > (just as > > > > > > a > > > > > > > > >> terminated > > > > > > > > >> > > >>>> topic cannot have any new messages published to > > it). > > > > This > > > > > > > would > > > > > > > > >> take > > > > > > > > >> > > >>>> care of all topic creation race conditions. We'd > > > > probably > > > > > > > need > > > > > > > > >> to add > > > > > > > > >> > > >>>> new protobuf exceptions for this feature. > > > > > > > > >> > > >>>> > > > > > > > > >> > > >>>> Thanks, > > > > > > > > >> > > >>>> Michael > > > > > > > > >> > > >>>> > > > > > > > > >> > > >>>> > > > > > > > > >> > > >>>> On Sat, Jan 29, 2022 at 7:25 PM Zike Yang > > > > > > > > >> > > >>>> <zky...@streamnative.io.invalid> wrote: > > > > > > > > >> > > >>>>> > > > > > > > > >> > > >>>>> +1 > > > > > > > > >> > > >>>>> > > > > > > > > >> > > >>>>> > > > > > > > > >> > > >>>>> Thanks, > > > > > > > > >> > > >>>>> Zike > > > > > > > > >> > > >>>>> > > > > > > > > >> > > >>>>> On Sat, Jan 29, 2022 at 12:30 PM guo jiwei < > > > > > > > > >> techno...@apache.org> > > > > > > > > >> > > >>> wrote: > > > > > > > > >> > > >>>>>> > > > > > > > > >> > > >>>>>> Hi > > > > > > > > >> > > >>>>>> The PIP link : > > > > > > > https://github.com/apache/pulsar/issues/13989 > > > > > > > > >> > > >>>>>> > > > > > > > > >> > > >>>>>> Regards > > > > > > > > >> > > >>>>>> Jiwei Guo (Tboy) > > > > > > > > >> > > >>>>>> > > > > > > > > >> > > >>>>>> > > > > > > > > >> > > >>>>>> On Sat, Jan 29, 2022 at 11:46 AM mattison chao > < > > > > > > > > >> mattisonc...@gmail.com > > > > > > > > >> > > >>>> > > > > > > > > >> > > >>>>>> wrote: > > > > > > > > >> > > >>>>>> > > > > > > > > >> > > >>>>>>> Hello everyone, > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> I want to start a discussion about PIP-139 : > > Support > > > > > > > Broker > > > > > > > > >> send > > > > > > > > >> > > >>> command > > > > > > > > >> > > >>>>>>> to real close producer/consumer. > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> This is the PIP document > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> https://github.com/apache/pulsar/issues/13989 > < > > > > > > > > >> > > >>>>>>> https://github.com/apache/pulsar/issues/13979 > > > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> Please check it out and feel free to share > your > > > > > > thoughts. > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> Best, > > > > > > > > >> > > >>>>>>> Mattison > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> ———————— Pasted below for quoting convenience. > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> Relation pull request: #13337 > > > > > > > > >> > > >>>>>>> Authors: @Technoboy- @mattisonchao > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> ## Motivation > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> Before we discuss this pip, I'd like to > > supplement > > > > some > > > > > > > > >> context to > > > > > > > > >> > > >>> help > > > > > > > > >> > > >>>>>>> contributors who don't want to read the > original > > > > pull > > > > > > > request. > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>>> When there are no user-created topics under a > > > > > > namespace, > > > > > > > > >> Namespace > > > > > > > > >> > > >>>>>>> should be deleted. But currently, the system > > topic > > > > > > existed > > > > > > > > >> and the > > > > > > > > >> > > >>>>>>> reader/producer could auto-create the system > > which > > > > may > > > > > > > cause > > > > > > > > >> the > > > > > > > > >> > > >>> namespace > > > > > > > > >> > > >>>>>>> deletion to fail. > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> For this reason, we need to close the system > > topic > > > > > > > > >> reader/producer > > > > > > > > >> > > >>> first, > > > > > > > > >> > > >>>>>>> then remove the system topic. finally, remove > > the > > > > > > > namespace. > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> Following this way, we first want to use > > > > ``terminate`` > > > > > > to > > > > > > > > >> solve this > > > > > > > > >> > > >>>>>>> problem. then we found producers can > > disconnect, but > > > > > > > > >> consumers are > > > > > > > > >> > > >>> still > > > > > > > > >> > > >>>>>>> alive. So, another PR #13960 wants to add > > consumers' > > > > > > > closing > > > > > > > > >> logic. > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> After #13960, all things look good, but > another > > > > problem > > > > > > > > >> appears. that > > > > > > > > >> > > >>> is > > > > > > > > >> > > >>>>>>> we need to wait until consumers completely > > consume > > > > all > > > > > > > > >> messages (this > > > > > > > > >> > > >>> may > > > > > > > > >> > > >>>>>>> make terminate topic so long and the operation > > > > > > > timeout)then > > > > > > > > >> get > > > > > > > > >> > > >>>>>>> ``reachedEndOfTopic``. the relative code here > : > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > https://github.com/apache/pulsar/blob/07ef9231db8b844586b9217ee2d59237eb9c54b7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/CompactorSubscription.java#L102-L106 > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> In the #13337 case, we need to force close > > consumers > > > > > > > > >> immediately. So > > > > > > > > >> > > >>> we > > > > > > > > >> > > >>>>>>> write this PIP to discuss another way to > resolve > > > > this > > > > > > > problem. > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> ## Goal > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> We can add a new field(`allow_reconnect`) in > > command > > > > > > > > >> > > >>>>>>> ``CommandCloseProducer``/ > > ``CommandCloseConsumer`` > > > > to > > > > > > > close > > > > > > > > >> > > >>>>>>> producer/consumers immediately. > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> ## API Changes > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> - Add ``allow_reconnect`` to > > > > ``CommandCloseProducer``; > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> ```java > > > > > > > > >> > > >>>>>>> **Before** > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> message CommandCloseProducer { > > > > > > > > >> > > >>>>>>> required uint64 producer_id = 1; > > > > > > > > >> > > >>>>>>> required uint64 request_id = 2; > > > > > > > > >> > > >>>>>>> } > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> **After** > > > > > > > > >> > > >>>>>>> message CommandCloseProducer { > > > > > > > > >> > > >>>>>>> required uint64 producer_id = 1; > > > > > > > > >> > > >>>>>>> required uint64 request_id = 2; > > > > > > > > >> > > >>>>>>> optional bool allow_reconnect = 3 [default = > > > > true]; > > > > > > > > >> > > >>>>>>> } > > > > > > > > >> > > >>>>>>> ``` > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> - Add ``allow_reconnect`` to > > > > ``CommandCloseConsumer`` > > > > > > > > >> > > >>>>>>> ```java > > > > > > > > >> > > >>>>>>> **Before** > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> message CommandCloseConsumer { > > > > > > > > >> > > >>>>>>> required uint64 consumer_id = 1; > > > > > > > > >> > > >>>>>>> required uint64 request_id = 2; > > > > > > > > >> > > >>>>>>> } > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> **After** > > > > > > > > >> > > >>>>>>> message CommandCloseConsumer { > > > > > > > > >> > > >>>>>>> required uint64 consumer_id = 1; > > > > > > > > >> > > >>>>>>> required uint64 request_id = 2; > > > > > > > > >> > > >>>>>>> optional bool allow_reconnect = 3 [default = > > > > true]; > > > > > > > > >> > > >>>>>>> } > > > > > > > > >> > > >>>>>>> ``` > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> ## Implementation > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> ### ClientCnx - Producer: > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> **Before** > > > > > > > > >> > > >>>>>>> ```java > > > > > > > > >> > > >>>>>>> @Override > > > > > > > > >> > > >>>>>>> protected void > > > > > > handleCloseProducer(CommandCloseProducer > > > > > > > > >> > > >>> closeProducer) > > > > > > > > >> > > >>>>>>> { > > > > > > > > >> > > >>>>>>> log.info("[{}] Broker notification of > > Closed > > > > > > > producer: > > > > > > > > >> {}", > > > > > > > > >> > > >>>>>>> remoteAddress, closeProducer.getProducerId()); > > > > > > > > >> > > >>>>>>> final long producerId = > > > > > > > closeProducer.getProducerId(); > > > > > > > > >> > > >>>>>>> ProducerImpl<?> producer = > > > > > > > producers.get(producerId); > > > > > > > > >> > > >>>>>>> if (producer != null) { > > > > > > > > >> > > >>>>>>> producer.connectionClosed(this); > > > > > > > > >> > > >>>>>>> } else { > > > > > > > > >> > > >>>>>>> log.warn("Producer with id {} not > > found > > > > while > > > > > > > > >> closing > > > > > > > > >> > > >>> producer > > > > > > > > >> > > >>>>>>> ", producerId); > > > > > > > > >> > > >>>>>>> } > > > > > > > > >> > > >>>>>>> } > > > > > > > > >> > > >>>>>>> ``` > > > > > > > > >> > > >>>>>>> After: > > > > > > > > >> > > >>>>>>> ```java > > > > > > > > >> > > >>>>>>> @Override > > > > > > > > >> > > >>>>>>> protected void > > > > > > handleCloseProducer(CommandCloseProducer > > > > > > > > >> > > >>> closeProducer) > > > > > > > > >> > > >>>>>>> { > > > > > > > > >> > > >>>>>>> log.info("[{}] Broker notification of > > Closed > > > > > > > producer: > > > > > > > > >> {}", > > > > > > > > >> > > >>>>>>> remoteAddress, closeProducer.getProducerId()); > > > > > > > > >> > > >>>>>>> final long producerId = > > > > > > > closeProducer.getProducerId(); > > > > > > > > >> > > >>>>>>> ProducerImpl<?> producer = > > > > > > > producers.get(producerId); > > > > > > > > >> > > >>>>>>> if (producer != null) { > > > > > > > > >> > > >>>>>>> if (closeProducer.isAllowReconnect) > { > > > > > > > > >> > > >>>>>>> producer.connectionClosed(this); > > > > > > > > >> > > >>>>>>> } else { > > > > > > > > >> > > >>>>>>> producer.closeAsync(); > > > > > > > > >> > > >>>>>>> } > > > > > > > > >> > > >>>>>>> } else { > > > > > > > > >> > > >>>>>>> log.warn("Producer with id {} not > > found > > > > while > > > > > > > > >> closing > > > > > > > > >> > > >>> producer > > > > > > > > >> > > >>>>>>> ", producerId); > > > > > > > > >> > > >>>>>>> } > > > > > > > > >> > > >>>>>>> } > > > > > > > > >> > > >>>>>>> ``` > > > > > > > > >> > > >>>>>>> ### ClientCnx - Consumer: > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> **Before** > > > > > > > > >> > > >>>>>>> ```java > > > > > > > > >> > > >>>>>>> @Override > > > > > > > > >> > > >>>>>>> protected void > > > > > > handleCloseConsumer(CommandCloseConsumer > > > > > > > > >> > > >>> closeConsumer) > > > > > > > > >> > > >>>>>>> { > > > > > > > > >> > > >>>>>>> log.info("[{}] Broker notification of > > Closed > > > > > > > consumer: > > > > > > > > >> {}", > > > > > > > > >> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId()); > > > > > > > > >> > > >>>>>>> final long consumerId = > > > > > > > closeConsumer.getConsumerId(); > > > > > > > > >> > > >>>>>>> ConsumerImpl<?> consumer = > > > > > > > consumers.get(consumerId); > > > > > > > > >> > > >>>>>>> if (consumer != null) { > > > > > > > > >> > > >>>>>>> consumer.connectionClosed(this); > > > > > > > > >> > > >>>>>>> } else { > > > > > > > > >> > > >>>>>>> log.warn("Consumer with id {} not > > found > > > > while > > > > > > > > >> closing > > > > > > > > >> > > >>> consumer > > > > > > > > >> > > >>>>>>> ", consumerId); > > > > > > > > >> > > >>>>>>> } > > > > > > > > >> > > >>>>>>> } > > > > > > > > >> > > >>>>>>> ``` > > > > > > > > >> > > >>>>>>> **After** > > > > > > > > >> > > >>>>>>> ```java > > > > > > > > >> > > >>>>>>> @Override > > > > > > > > >> > > >>>>>>> protected void > > > > > > handleCloseConsumer(CommandCloseConsumer > > > > > > > > >> > > >>> closeConsumer) > > > > > > > > >> > > >>>>>>> { > > > > > > > > >> > > >>>>>>> log.info("[{}] Broker notification of > > Closed > > > > > > > consumer: > > > > > > > > >> {}", > > > > > > > > >> > > >>>>>>> remoteAddress, closeConsumer.getConsumerId()); > > > > > > > > >> > > >>>>>>> final long consumerId = > > > > > > > closeConsumer.getConsumerId(); > > > > > > > > >> > > >>>>>>> ConsumerImpl<?> consumer = > > > > > > > consumers.get(consumerId); > > > > > > > > >> > > >>>>>>> if (consumer != null) { > > > > > > > > >> > > >>>>>>> if (closeConsumer.isAllowReconnect) > { > > > > > > > > >> > > >>>>>>> consumer.connectionClosed(this); > > > > > > > > >> > > >>>>>>> } else { > > > > > > > > >> > > >>>>>>> consumer.closeAsync(); > > > > > > > > >> > > >>>>>>> } > > > > > > > > >> > > >>>>>>> } else { > > > > > > > > >> > > >>>>>>> log.warn("Consumer with id {} not > > found > > > > while > > > > > > > > >> closing > > > > > > > > >> > > >>> consumer > > > > > > > > >> > > >>>>>>> ", consumerId); > > > > > > > > >> > > >>>>>>> } > > > > > > > > >> > > >>>>>>> } > > > > > > > > >> > > >>>>>>> ``` > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> ## Reject Alternatives > > > > > > > > >> > > >>>>>>> > > > > > > > > >> > > >>>>>>> none. > > > > > > > > >> > > >>>>> > > > > > > > > >> > > >>>>> > > > > > > > > >> > > >>>>> > > > > > > > > >> > > >>>>> -- > > > > > > > > >> > > >>>>> Zike Yang > > > > > > > > >> > > >>> > > > > > > > > >> > > >>> > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > >