Hi Dave,

> automatically delete tenants and namespaces for not containing topics

I don't think that is what we are discussing. I agree that the initial
email says just that, though, which is why I asked above:

>> When there are no user-created topics under a namespace,
>> Namespace should be deleted.
> This is supposed to mean that the namespace should be able to be
> deleted, correct?

Perhaps Mattison can clarify, too.

My current understanding of the context for the PIP is that a user
call to delete a namespace without force can fail when a producer
reconnects to a deleted topic. The goal is to remove the race
condition to ensure namespace deletion can succeed.

Thanks,
Michael

On Thu, Feb 24, 2022 at 5:21 PM Dave Fisher <w...@apache.org> wrote:
>
> Hi -
>
> I hope I’m understanding what’s being discussed.
>
> If we are going to automatically delete tenants and namespaces for not 
> containing topics then we need to make both of these automatic actions 
> configurable with a default to NOT do so. Otherwise we break existing use 
> cases.
>
> Automatic deletion of namespaces should be configurable at both the cluster 
> and tenant level.
>
> Regards,
> Dave
>
> > On Feb 24, 2022, at 2:25 PM, Michael Marshall <mmarsh...@apache.org> wrote:
> >
> >> The old producer/consumer should be closed after applying the changes from
> >> this proposal.
> >
> > Penghui, are you suggesting that we implement the namespace/tenant
> > terminated logic after completing this PIP?
> >
> > For the sake of discussion, if we implement the namespace terminated
> > logic first, we could fulfill the underlying requirements for this PIP
> > by returning a new non-retriable error response when a client tries to
> > connect a producer or a consumer to a topic in a namespace that is
> > "terminated". If we do add the "namespace terminated" feature, we'll
> > need to add a non-retriable exception for this case, anyway. The main
> > advantage here is that we'd only need one expansion of the protobuf
> > instead of two. The downside is that the protocol for connected
> > clients has a couple more roundtrips. The broker would disconnect
> > connected clients and then fail their reconnection attempt with a
> > non-retriable error.
> >
> > Thanks,
> > Michael
> >
> > On Thu, Feb 24, 2022 at 7:11 AM PengHui Li <peng...@apache.org> wrote:
> >>
> >>> If we want to solve this problem, we need to add some sync resources like
> >> lock/state, I think it’s a harm for us, we don’t need to do that.
> >>
> >> I think we can make the namespace/tenants to the inactive state first so
> >> that we can avoid any new
> >> producer/consumer connect to the topic under the namespace/tenant.
> >>
> >> The old producer/consumer should be closed after applying the changes from
> >> this proposal.
> >>
> >> Thanks,
> >> Penghui
> >>
> >> On Tue, Feb 8, 2022 at 5:47 PM mattison chao <mattisonc...@gmail.com> 
> >> wrote:
> >>
> >>>> This is supposed to mean that the namespace should be able to be
> >>>> deleted, correct?
> >>>
> >>> Yes, the main background is the user doesn’t have an active topic. so,
> >>> they want to delete the namespace.
> >>>
> >>>> However, I think
> >>>> we might still have a race condition that could make tenant or
> >>>> namespace deletion fail. Specifically, if a new producer or consumer
> >>>> creates a topic after the namespace deletion has started but
> >>>> before it is complete. Do you agree that the underlying race still
> >>> exists?
> >>>
> >>> Yes, this condition exists. I think it’s not a big problem because the
> >>> user doesn’t want to use this namespace anymore.
> >>> If this scenario appears, they will get an error and need to delete it
> >>> again.
> >>>
> >>>> What if we expand our usage of the "terminated" feature to apply to
> >>>> namespaces (and tenants)? Then, a terminated namespace can have
> >>>> bundles and topics can be deleted but not created (just as a terminated
> >>>> topic cannot have any new messages published to it). This would take
> >>>> care of all topic creation race conditions. We'd probably need to add
> >>>> new protobuf exceptions for this feature.
> >>>
> >>>
> >>> If we want to solve this problem, we need to add some sync resources like
> >>> lock/state, I think it’s a harm for us, we don’t need to do that.
> >>>
> >>> Thanks for your suggestions, let me know what you think.
> >>>
> >>> Best,
> >>> Mattison
> >>>
> >>>> On Feb 1, 2022, at 2:26 PM, Michael Marshall <mmarsh...@apache.org>
> >>> wrote:
> >>>>
> >>>> This proposal identifies an important issue that we should definitely
> >>>> solve. I have some questions.
> >>>>
> >>>>> When there are no user-created topics under a namespace,
> >>>>> Namespace should be deleted.
> >>>>
> >>>> This is supposed to mean that the namespace should be able to be
> >>>> deleted, correct?
> >>>>
> >>>>> For this reason, we need to close the system topic reader/producer
> >>>>> first, then remove the system topic. finally, remove the namespace.
> >>>>
> >>>> I agree that expanding the protobuf CloseProducer and CloseConsumer
> >>>> commands could be valuable here. The expansion would ensure that
> >>>> producers and consumers don't attempt to reconnect. However, I think
> >>>> we might still have a race condition that could make tenant or
> >>>> namespace deletion fail. Specifically, if a new producer or consumer
> >>>> creates a topic after the namespace deletion has started but
> >>>> before it is complete. Do you agree that the underlying race still
> >>> exists?
> >>>>
> >>>> In my view, the fundamental problem here is that deleting certain Pulsar
> >>>> resources takes time and, in a distributed system, that means race
> >>>> conditions.
> >>>>
> >>>> What if we expand our usage of the "terminated" feature to apply to
> >>>> namespaces (and tenants)? Then, a terminated namespace can have
> >>>> bundles and topics can be deleted but not created (just as a terminated
> >>>> topic cannot have any new messages published to it). This would take
> >>>> care of all topic creation race conditions. We'd probably need to add
> >>>> new protobuf exceptions for this feature.
> >>>>
> >>>> Thanks,
> >>>> Michael
> >>>>
> >>>>
> >>>> On Sat, Jan 29, 2022 at 7:25 PM Zike Yang
> >>>> <zky...@streamnative.io.invalid> wrote:
> >>>>>
> >>>>> +1
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Zike
> >>>>>
> >>>>> On Sat, Jan 29, 2022 at 12:30 PM guo jiwei <techno...@apache.org>
> >>> wrote:
> >>>>>>
> >>>>>> Hi
> >>>>>> The PIP link : https://github.com/apache/pulsar/issues/13989
> >>>>>>
> >>>>>> Regards
> >>>>>> Jiwei Guo (Tboy)
> >>>>>>
> >>>>>>
> >>>>>> On Sat, Jan 29, 2022 at 11:46 AM mattison chao <mattisonc...@gmail.com
> >>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hello everyone,
> >>>>>>>
> >>>>>>> I want to start a discussion about PIP-139 : Support Broker send
> >>> command
> >>>>>>> to real close producer/consumer.
> >>>>>>>
> >>>>>>> This is the PIP document
> >>>>>>>
> >>>>>>> https://github.com/apache/pulsar/issues/13989 <
> >>>>>>> https://github.com/apache/pulsar/issues/13979>
> >>>>>>>
> >>>>>>> Please check it out and feel free to share your thoughts.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Mattison
> >>>>>>>
> >>>>>>>
> >>>>>>> ———————— Pasted below for quoting convenience.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Relation pull request:  #13337
> >>>>>>> Authors: @Technoboy-  @mattisonchao
> >>>>>>>
> >>>>>>> ## Motivation
> >>>>>>>
> >>>>>>> Before we discuss this pip, I'd like to supplement some context to
> >>> help
> >>>>>>> contributors who don't want to read the original pull request.
> >>>>>>>
> >>>>>>>> When there are no user-created topics under a namespace, Namespace
> >>>>>>> should be deleted. But currently, the system topic existed and the
> >>>>>>> reader/producer could auto-create the system which may cause the
> >>> namespace
> >>>>>>> deletion to fail.
> >>>>>>>
> >>>>>>> For this reason, we need to close the system topic reader/producer
> >>> first,
> >>>>>>> then remove the system topic. finally, remove the namespace.
> >>>>>>>
> >>>>>>> Following this way, we first want to use ``terminate`` to solve this
> >>>>>>> problem. then we found producers can disconnect, but consumers are
> >>> still
> >>>>>>> alive. So, another PR #13960 wants to add consumers' closing logic.
> >>>>>>>
> >>>>>>> After #13960, all things look good, but another problem appears. that
> >>> is
> >>>>>>> we need to wait until consumers completely consume all messages (this
> >>> may
> >>>>>>> make terminate topic so long and the operation timeout)then get
> >>>>>>> ``reachedEndOfTopic``. the relative code here :
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>> https://github.com/apache/pulsar/blob/07ef9231db8b844586b9217ee2d59237eb9c54b7/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/CompactorSubscription.java#L102-L106
> >>>>>>>
> >>>>>>> In the #13337 case, we need to force close consumers immediately. So
> >>> we
> >>>>>>> write this PIP to discuss another way to resolve this problem.
> >>>>>>>
> >>>>>>> ## Goal
> >>>>>>>
> >>>>>>> We can add a new field(`allow_reconnect`) in command
> >>>>>>> ``CommandCloseProducer``/ ``CommandCloseConsumer`` to close
> >>>>>>> producer/consumers immediately.
> >>>>>>>
> >>>>>>> ## API Changes
> >>>>>>>
> >>>>>>> - Add ``allow_reconnect`` to ``CommandCloseProducer``;
> >>>>>>>
> >>>>>>> ```java
> >>>>>>> **Before**
> >>>>>>>
> >>>>>>> message CommandCloseProducer {
> >>>>>>>   required uint64 producer_id = 1;
> >>>>>>>   required uint64 request_id = 2;
> >>>>>>> }
> >>>>>>>
> >>>>>>> **After**
> >>>>>>> message CommandCloseProducer {
> >>>>>>>   required uint64 producer_id = 1;
> >>>>>>>   required uint64 request_id = 2;
> >>>>>>>   optional bool allow_reconnect = 3 [default = true];
> >>>>>>> }
> >>>>>>> ```
> >>>>>>>
> >>>>>>> - Add ``allow_reconnect`` to ``CommandCloseConsumer``
> >>>>>>> ```java
> >>>>>>> **Before**
> >>>>>>>
> >>>>>>> message CommandCloseConsumer {
> >>>>>>>   required uint64 consumer_id = 1;
> >>>>>>>   required uint64 request_id = 2;
> >>>>>>> }
> >>>>>>>
> >>>>>>> **After**
> >>>>>>> message CommandCloseConsumer {
> >>>>>>>   required uint64 consumer_id = 1;
> >>>>>>>   required uint64 request_id = 2;
> >>>>>>>   optional bool allow_reconnect = 3 [default = true];
> >>>>>>> }
> >>>>>>> ```
> >>>>>>>
> >>>>>>> ## Implementation
> >>>>>>>
> >>>>>>> ### ClientCnx - Producer:
> >>>>>>>
> >>>>>>> **Before**
> >>>>>>> ```java
> >>>>>>>   @Override
> >>>>>>>   protected void handleCloseProducer(CommandCloseProducer
> >>> closeProducer)
> >>>>>>> {
> >>>>>>>       log.info("[{}] Broker notification of Closed producer: {}",
> >>>>>>> remoteAddress, closeProducer.getProducerId());
> >>>>>>>       final long producerId = closeProducer.getProducerId();
> >>>>>>>       ProducerImpl<?> producer = producers.get(producerId);
> >>>>>>>       if (producer != null) {
> >>>>>>>           producer.connectionClosed(this);
> >>>>>>>       } else {
> >>>>>>>           log.warn("Producer with id {} not found while closing
> >>> producer
> >>>>>>> ", producerId);
> >>>>>>>       }
> >>>>>>>   }
> >>>>>>> ```
> >>>>>>> After:
> >>>>>>> ```java
> >>>>>>>  @Override
> >>>>>>>   protected void handleCloseProducer(CommandCloseProducer
> >>> closeProducer)
> >>>>>>> {
> >>>>>>>       log.info("[{}] Broker notification of Closed producer: {}",
> >>>>>>> remoteAddress, closeProducer.getProducerId());
> >>>>>>>       final long producerId = closeProducer.getProducerId();
> >>>>>>>       ProducerImpl<?> producer = producers.get(producerId);
> >>>>>>>       if (producer != null) {
> >>>>>>>           if (closeProducer.isAllowReconnect) {
> >>>>>>>               producer.connectionClosed(this);
> >>>>>>>           } else {
> >>>>>>>               producer.closeAsync();
> >>>>>>>           }
> >>>>>>>       } else {
> >>>>>>>           log.warn("Producer with id {} not found while closing
> >>> producer
> >>>>>>> ", producerId);
> >>>>>>>       }
> >>>>>>>   }
> >>>>>>> ```
> >>>>>>> ### ClientCnx - Consumer:
> >>>>>>>
> >>>>>>> **Before**
> >>>>>>> ```java
> >>>>>>>   @Override
> >>>>>>>   protected void handleCloseConsumer(CommandCloseConsumer
> >>> closeConsumer)
> >>>>>>> {
> >>>>>>>       log.info("[{}] Broker notification of Closed consumer: {}",
> >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> >>>>>>>       final long consumerId = closeConsumer.getConsumerId();
> >>>>>>>       ConsumerImpl<?> consumer = consumers.get(consumerId);
> >>>>>>>       if (consumer != null) {
> >>>>>>>           consumer.connectionClosed(this);
> >>>>>>>       } else {
> >>>>>>>           log.warn("Consumer with id {} not found while closing
> >>> consumer
> >>>>>>> ", consumerId);
> >>>>>>>       }
> >>>>>>>   }
> >>>>>>> ```
> >>>>>>> **After**
> >>>>>>> ```java
> >>>>>>>   @Override
> >>>>>>>   protected void handleCloseConsumer(CommandCloseConsumer
> >>> closeConsumer)
> >>>>>>> {
> >>>>>>>       log.info("[{}] Broker notification of Closed consumer: {}",
> >>>>>>> remoteAddress, closeConsumer.getConsumerId());
> >>>>>>>       final long consumerId = closeConsumer.getConsumerId();
> >>>>>>>       ConsumerImpl<?> consumer = consumers.get(consumerId);
> >>>>>>>       if (consumer != null) {
> >>>>>>>           if (closeConsumer.isAllowReconnect) {
> >>>>>>>               consumer.connectionClosed(this);
> >>>>>>>           } else {
> >>>>>>>               consumer.closeAsync();
> >>>>>>>           }
> >>>>>>>       } else {
> >>>>>>>           log.warn("Consumer with id {} not found while closing
> >>> consumer
> >>>>>>> ", consumerId);
> >>>>>>>       }
> >>>>>>>   }
> >>>>>>> ```
> >>>>>>>
> >>>>>>>
> >>>>>>> ## Reject Alternatives
> >>>>>>>
> >>>>>>> none.
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Zike Yang
> >>>
> >>>
>

Reply via email to