> I think we should consider adding a new policy for Pulsar topics: a namespace (or topic) policy that makes it possible to retain messages indefinitely when a topic has no subscriptions.
It looks like a feature that supports retaining data while no subscriptions. With infinite data retention, the data will not be removed after all the subscriptions acked the message. But with “retain_data_no_subscriptions”, the data will be removed after all the subscriptions acked messages. But for the subsequent subscriptions, still can't retrieve all the data, so it looks like only guarantee the first subscription can retrieve all the data. If users want to guarantee all the subscriptions (all the existing and will create subscriptions), that is equivalent to infinite data retention. For the auto-created topic, the subscription can only be determined at the time of creation. It may or may not create. If users are able to determine which consumers are, and these consumers need to receive any message sent by the producer, they should create the topic and subscription manually or use the consumer to trigger the topic auto-creation, not the producer. It is not easy to determine consumer behavior on the producer side. But for DLQ, it's not a normal topic from the user's point of view, it's a local container for a subscription to store the messages that the consumer can't process. It's a "consumer determine consumer behavior", I think this is the most essential difference. Regards, Penghui On Sat, Jan 8, 2022 at 12:34 PM Michael Marshall <mikemars...@gmail.com> wrote: > Thanks for your response, Penghui. > > I support simplifying message loss prevention for DLQ topics. However, > it's not clear to me why we should only simplify it for DLQ topics. > > As a Pulsar user, I encountered many of the challenges you mention > when producing to auto created topics. In my architecture, I had > consumers reading from an input topic, transforming the data, and then > producing to an arbitrary number of output topics. My business logic > required that I not lose any messages, which is essentially the same > expectation from DLQ users here. I ended up increasing the retention > policy to about 4 hours on the output topics to minimize the possibility > of losing data. I had to scale up my bookkeeper cluster because of the > extra retention. If I had been able to ensure my auto created topic > would not delete messages before I created my subscriptions, I would > have had no retention policy and a smaller bookie cluster. > > > Yes, essentially, the DLQ is only a topic, no other specific behaviors. > > But the issue that the proposal wants to resolve is not to introduce a > > specific behavior for the DLQ topic or something > > I'm not sure this statement aligns with the PIP. It seems to me that > the PIP proposes solving the message loss issues by adding a DLQ > specific feature to the pulsar client. > > Earlier, I proposed expanding the CreateProducer command to be able to > create a subscription. This solution is not right: it tightly couples > producers and consumers, which we want to avoid. > > I think we should consider adding a new policy for Pulsar topics: a > namespace (or topic) policy that makes it possible to retain messages > indefinitely when a topic has no subscriptions. > > Our message retention feature is very valuable. However, > message retention doesn't solve the "slow to subscribe" consumer > problem. In the event of long network partitions, a consumer might not be > able to subscribe before messages are deleted. This feature > mitigates that risk and allows users to set message retention time > based on other needs, not based on calculations about how long it > could take to subscribe to a topic. > > This feature solves the DLQ message loss issue because the DLQ > producer can produce to any namespace, which is important for clusters > that do not have topic level policies enabled. > > Let me know what you think. > > Thanks, > Michael > > On Tue, Jan 4, 2022 at 10:33 PM PengHui Li <peng...@apache.org> wrote: > > > > Thanks for the great comments, Michael. > > > > Let me try to clarify some context about the issue that users encountered > > and the improvement that the proposal wants to Introduce. > > > > > Before we get further into the implementation, I'd like to discuss > > whether the current behavior is the expected behavior, as this is > > the key motivation for this feature. > > > > The DLQ can generate dynamically and users might have short > > data retention for a namespace by time or by size. But the messages > > in the DLQ usually compensate afterward, and we should allow users > > to keep the data in the DLQ only if they want to delete them manually. > > > > The DLQ is always for a subscriber, so a subscriber can use a init name > > to achieve the purpose of not being cleaned up from the DLQ. > > > > So the key point for this proposal is to keep data in the lazy created > DLQ > > topic until users wants to delete them manually. > > > > > I think the DLQ's current behavior is the expected behavior because > > the DLQ is only a topic and topics lose messages unless they have a > > subscription or a retention policy. > > > > Yes, essentially, the DLQ is only a topic, no other specific behaviors. > > But the issue that the proposal wants to resolve is not to introduce a > > specific > > behavior for the DLQ topic or something. It is just from the perspective > of > > the DLQ use case, > > Convenient for users to keep data in DLQ. > > > > Without this option, we are not easy to support setting a subscription or > > data retention > > policy for a lazy created DLQ topic. > > > > > I admit that it is not necessarily a nice default behavior to > > potentially lose messages, but this is the design for all topics. > > Based on the current design, an admin can create a retention policy > > for the topic or namespace. Then, consumers of the > > topic have the duration of the retention policy to discover the topic > > and create a subscription before messages are lost. Is there a reason > > this solution doesn't work for the DLQ topic? > > > > The difference here is when the subscriber subscribes to the topic. > > For a normal topic, the expected behavior is the subscriber able to read > all > > messages of the topic. It can start consuming for the earliest or latest > or > > any other > > valid positions. But for the DLQ, contains part of the original data for > a > > subscription. > > Users always don't expect to miss some head messages in the DLQ. > Otherwise, > > You will get 1,2,3 first, and 4,5 to DLQ and continue to receive 6,7, but > > 4,5 might > > removed by pulsar automatically by Pulsar. > > > > The current solution does not work well for DLQ topic is users not easy > to > > set a different > > data retention policy or create a new subscription for a lazy created DLQ > > topic. > > > > > As an aside, I wonder if topic discoverability is part of the problem > > here. It would be extremely valuable to get notifications any > > time a topic is created. That would allow users to move away from > > polling for current topic names towards a more reactive design. > > > > The notification is a good idea, for this case, the notification will > have > > some drawbacks: > > > > 1. The delayed notification might not allow us to achieve the purpose > > 2. The complexity will increase, auth for the notifications, users > need > > to handle the events > > > > But the notifications can help in lots of parts such as improving > > observability, etc. > > > > Regards, > > Penghui > > > > On Tue, Jan 4, 2022 at 2:41 PM Michael Marshall <mmarsh...@apache.org> > > wrote: > > > > > Before we get further into the implementation, I'd like to discuss > > > whether the current behavior is the expected behavior, as this is > > > the key motivation for this feature. > > > > > > I think the DLQ's current behavior is the expected behavior because > > > the DLQ is only a topic and topics lose messages unless they have a > > > subscription or a retention policy. > > > > > > I admit that it is not necessarily a nice default behavior to > > > potentially lose messages, but this is the design for all topics. > > > Based on the current design, an admin can create a retention policy > > > for the topic or namespace. Then, consumers of the > > > topic have the duration of the retention policy to discover the topic > > > and create a subscription before messages are lost. Is there a reason > > > this solution doesn't work for the DLQ topic? > > > > > > Perhaps the disconnect here is that users of the DLQ feature do not > > > view the DLQ as only a Pulsar topic. I look forward to your thoughts. > > > > > > As an aside, I wonder if topic discoverability is part of the problem > > > here. It would be extremely valuable to get notifications any > > > time a topic is created. That would allow users to move away from > > > polling for current topic names towards a more reactive design. > > > > > > Thanks, > > > Michael > > > > > > > > > On Tue, Dec 28, 2021 at 7:59 PM Zike Yang > > > <zky...@streamnative.io.invalid> wrote: > > > > > > > > > Oh, that's a very interesting point. I think it'd be easy to add > that > > > > > as "internal" feature, though I'm a bit puzzled on how to add that > to > > > > > the producer API > > > > > > > > I think we can add a field `String initialSubscriptionName` to the > > > > Producer Configuration. And add a new field `optional string > > > > initial_subscription_name` to the `CommnadProducer`. > > > > When the Broker handles the CommandProducer, if it checks that the > > > > initialSubscriptionName is not empty or null, it will use > > > > initialSubscriptionName to create a subscription on that topic. When > > > > creating the deadLetterProducer or retryLetterProducer, we can > specify > > > > and create the initial subscription directly through the Producer. > > > > What do you think? > > > > > > > > On Thu, Dec 23, 2021 at 7:42 AM Matteo Merli <matteo.me...@gmail.com > > > > > wrote: > > > > > > > > > > > What if we extended the `CommandProducer` command to add a > > > > > > `create_subscription` field? Then, any time a topic is auto > > > > > > created and this field is true, the broker would auto create a > > > > > > subscription. There are some details to work out, but I think > this > > > > > > feature would fulfill the needs of this PIP and would also be > broadly > > > > > > useful for many client applications that dynamically create > topics. > > > > > > > > > > Oh, that's a very interesting point. I think it'd be easy to add > that > > > > > as "internal" feature, though I'm a bit puzzled on how to add that > to > > > > > the producer API > > > > > > > > > > > > > > > > -- > > > > Zike Yang > > > >