> It looks like a feature that supports retaining data while no subscriptions.
Yes, that is my proposed feature. How we handle messages on a topic with an empty set of subscriptions is a design decision. Note that when there are no subscriptions for a topic, the following two statements are both true (in a set theoretic sense): 1. All messages are acknowledged for all subscriptions. 2. No messages are acknowledged for all subscriptions. Pulsar's current design only uses option 1. I propose that we make it possible to use option 2. (Option 2 would solve the DLQ concerns here.) > so it looks like only guarantee the > first subscription can > retrieve all the data Yes, that is true. However, it is also true in this DLQ PIP, since the current design only creates a single subscription. I think the important nuance is that we're deciding how to handle a topic with no subscriptions. > they should create the topic and subscription manually or use the > consumer to trigger the topic auto-creation, not the producer. When producers create arbitrary topics, this design forces the producer to create subscriptions, which is the same design for this PIP. I think we should avoid producers creating subscriptions. > It is not easy to determine consumer behavior on the producer side. But for > DLQ, it's not a normal topic from the user's point of view If we want to hold that the DLQ is not a normal topic, then I can see why we would have a DLQ specific feature here. Thanks, Michael On Sun, Jan 9, 2022 at 10:20 PM PengHui Li <peng...@apache.org> wrote: > > > I think we should consider adding a new policy for Pulsar topics: a > namespace (or topic) policy that makes it possible to retain messages > indefinitely when a topic has no subscriptions. > > It looks like a feature that supports retaining data while no subscriptions. > With infinite data retention, the data will not be removed after all the > subscriptions > acked the message. But with “retain_data_no_subscriptions”, the data will > be removed > after all the subscriptions acked messages. But for the subsequent > subscriptions, > still can't retrieve all the data, so it looks like only guarantee the > first subscription can > retrieve all the data. If users want to guarantee all the subscriptions > (all the existing and will create subscriptions), > that is equivalent to infinite data retention. > > For the auto-created topic, the subscription can only be determined at the > time of creation. It may or may not create. If users are able to determine > which consumers are, > and these consumers need to receive any message sent by the producer, they > should > create the topic and subscription manually or use the consumer to trigger > the topic > auto-creation, not the producer. > > It is not easy to determine consumer behavior on the producer side. But for > DLQ, > it's not a normal topic from the user's point of view, it's a local > container for a subscription > to store the messages that the consumer can't process. > It's a "consumer determine consumer behavior", I think this is the most > essential difference. > > Regards, > Penghui > > On Sat, Jan 8, 2022 at 12:34 PM Michael Marshall <mikemars...@gmail.com> > wrote: > > > Thanks for your response, Penghui. > > > > I support simplifying message loss prevention for DLQ topics. However, > > it's not clear to me why we should only simplify it for DLQ topics. > > > > As a Pulsar user, I encountered many of the challenges you mention > > when producing to auto created topics. In my architecture, I had > > consumers reading from an input topic, transforming the data, and then > > producing to an arbitrary number of output topics. My business logic > > required that I not lose any messages, which is essentially the same > > expectation from DLQ users here. I ended up increasing the retention > > policy to about 4 hours on the output topics to minimize the possibility > > of losing data. I had to scale up my bookkeeper cluster because of the > > extra retention. If I had been able to ensure my auto created topic > > would not delete messages before I created my subscriptions, I would > > have had no retention policy and a smaller bookie cluster. > > > > > Yes, essentially, the DLQ is only a topic, no other specific behaviors. > > > But the issue that the proposal wants to resolve is not to introduce a > > > specific behavior for the DLQ topic or something > > > > I'm not sure this statement aligns with the PIP. It seems to me that > > the PIP proposes solving the message loss issues by adding a DLQ > > specific feature to the pulsar client. > > > > Earlier, I proposed expanding the CreateProducer command to be able to > > create a subscription. This solution is not right: it tightly couples > > producers and consumers, which we want to avoid. > > > > I think we should consider adding a new policy for Pulsar topics: a > > namespace (or topic) policy that makes it possible to retain messages > > indefinitely when a topic has no subscriptions. > > > > Our message retention feature is very valuable. However, > > message retention doesn't solve the "slow to subscribe" consumer > > problem. In the event of long network partitions, a consumer might not be > > able to subscribe before messages are deleted. This feature > > mitigates that risk and allows users to set message retention time > > based on other needs, not based on calculations about how long it > > could take to subscribe to a topic. > > > > This feature solves the DLQ message loss issue because the DLQ > > producer can produce to any namespace, which is important for clusters > > that do not have topic level policies enabled. > > > > Let me know what you think. > > > > Thanks, > > Michael > > > > On Tue, Jan 4, 2022 at 10:33 PM PengHui Li <peng...@apache.org> wrote: > > > > > > Thanks for the great comments, Michael. > > > > > > Let me try to clarify some context about the issue that users encountered > > > and the improvement that the proposal wants to Introduce. > > > > > > > Before we get further into the implementation, I'd like to discuss > > > whether the current behavior is the expected behavior, as this is > > > the key motivation for this feature. > > > > > > The DLQ can generate dynamically and users might have short > > > data retention for a namespace by time or by size. But the messages > > > in the DLQ usually compensate afterward, and we should allow users > > > to keep the data in the DLQ only if they want to delete them manually. > > > > > > The DLQ is always for a subscriber, so a subscriber can use a init name > > > to achieve the purpose of not being cleaned up from the DLQ. > > > > > > So the key point for this proposal is to keep data in the lazy created > > DLQ > > > topic until users wants to delete them manually. > > > > > > > I think the DLQ's current behavior is the expected behavior because > > > the DLQ is only a topic and topics lose messages unless they have a > > > subscription or a retention policy. > > > > > > Yes, essentially, the DLQ is only a topic, no other specific behaviors. > > > But the issue that the proposal wants to resolve is not to introduce a > > > specific > > > behavior for the DLQ topic or something. It is just from the perspective > > of > > > the DLQ use case, > > > Convenient for users to keep data in DLQ. > > > > > > Without this option, we are not easy to support setting a subscription or > > > data retention > > > policy for a lazy created DLQ topic. > > > > > > > I admit that it is not necessarily a nice default behavior to > > > potentially lose messages, but this is the design for all topics. > > > Based on the current design, an admin can create a retention policy > > > for the topic or namespace. Then, consumers of the > > > topic have the duration of the retention policy to discover the topic > > > and create a subscription before messages are lost. Is there a reason > > > this solution doesn't work for the DLQ topic? > > > > > > The difference here is when the subscriber subscribes to the topic. > > > For a normal topic, the expected behavior is the subscriber able to read > > all > > > messages of the topic. It can start consuming for the earliest or latest > > or > > > any other > > > valid positions. But for the DLQ, contains part of the original data for > > a > > > subscription. > > > Users always don't expect to miss some head messages in the DLQ. > > Otherwise, > > > You will get 1,2,3 first, and 4,5 to DLQ and continue to receive 6,7, but > > > 4,5 might > > > removed by pulsar automatically by Pulsar. > > > > > > The current solution does not work well for DLQ topic is users not easy > > to > > > set a different > > > data retention policy or create a new subscription for a lazy created DLQ > > > topic. > > > > > > > As an aside, I wonder if topic discoverability is part of the problem > > > here. It would be extremely valuable to get notifications any > > > time a topic is created. That would allow users to move away from > > > polling for current topic names towards a more reactive design. > > > > > > The notification is a good idea, for this case, the notification will > > have > > > some drawbacks: > > > > > > 1. The delayed notification might not allow us to achieve the purpose > > > 2. The complexity will increase, auth for the notifications, users > > need > > > to handle the events > > > > > > But the notifications can help in lots of parts such as improving > > > observability, etc. > > > > > > Regards, > > > Penghui > > > > > > On Tue, Jan 4, 2022 at 2:41 PM Michael Marshall <mmarsh...@apache.org> > > > wrote: > > > > > > > Before we get further into the implementation, I'd like to discuss > > > > whether the current behavior is the expected behavior, as this is > > > > the key motivation for this feature. > > > > > > > > I think the DLQ's current behavior is the expected behavior because > > > > the DLQ is only a topic and topics lose messages unless they have a > > > > subscription or a retention policy. > > > > > > > > I admit that it is not necessarily a nice default behavior to > > > > potentially lose messages, but this is the design for all topics. > > > > Based on the current design, an admin can create a retention policy > > > > for the topic or namespace. Then, consumers of the > > > > topic have the duration of the retention policy to discover the topic > > > > and create a subscription before messages are lost. Is there a reason > > > > this solution doesn't work for the DLQ topic? > > > > > > > > Perhaps the disconnect here is that users of the DLQ feature do not > > > > view the DLQ as only a Pulsar topic. I look forward to your thoughts. > > > > > > > > As an aside, I wonder if topic discoverability is part of the problem > > > > here. It would be extremely valuable to get notifications any > > > > time a topic is created. That would allow users to move away from > > > > polling for current topic names towards a more reactive design. > > > > > > > > Thanks, > > > > Michael > > > > > > > > > > > > On Tue, Dec 28, 2021 at 7:59 PM Zike Yang > > > > <zky...@streamnative.io.invalid> wrote: > > > > > > > > > > > Oh, that's a very interesting point. I think it'd be easy to add > > that > > > > > > as "internal" feature, though I'm a bit puzzled on how to add that > > to > > > > > > the producer API > > > > > > > > > > I think we can add a field `String initialSubscriptionName` to the > > > > > Producer Configuration. And add a new field `optional string > > > > > initial_subscription_name` to the `CommnadProducer`. > > > > > When the Broker handles the CommandProducer, if it checks that the > > > > > initialSubscriptionName is not empty or null, it will use > > > > > initialSubscriptionName to create a subscription on that topic. When > > > > > creating the deadLetterProducer or retryLetterProducer, we can > > specify > > > > > and create the initial subscription directly through the Producer. > > > > > What do you think? > > > > > > > > > > On Thu, Dec 23, 2021 at 7:42 AM Matteo Merli <matteo.me...@gmail.com > > > > > > > wrote: > > > > > > > > > > > > > What if we extended the `CommandProducer` command to add a > > > > > > > `create_subscription` field? Then, any time a topic is auto > > > > > > > created and this field is true, the broker would auto create a > > > > > > > subscription. There are some details to work out, but I think > > this > > > > > > > feature would fulfill the needs of this PIP and would also be > > broadly > > > > > > > useful for many client applications that dynamically create > > topics. > > > > > > > > > > > > Oh, that's a very interesting point. I think it'd be easy to add > > that > > > > > > as "internal" feature, though I'm a bit puzzled on how to add that > > to > > > > > > the producer API > > > > > > > > > > > > > > > > > > > > -- > > > > > Zike Yang > > > > > >