Re: [DISCUSSION] PIP-124: Create init subscription before sending message to DLQ

Michael Marshall Tue, 11 Jan 2022 22:39:14 -0800

> It looks like a feature that supports retaining data while no subscriptions.


Yes, that is my proposed feature. How we handle messages on a topic
with an empty set of subscriptions is a design decision.

Note that when there are no subscriptions for a topic, the following two
statements are both true (in a set theoretic sense):

1. All messages are acknowledged for all subscriptions.
2. No messages are acknowledged for all subscriptions.

Pulsar's current design only uses option 1. I propose that we make it
possible to use option 2. (Option 2 would solve the DLQ concerns here.)

> so it looks like only guarantee the
> first subscription can
> retrieve all the data

Yes, that is true. However, it is also true in this DLQ PIP, since the
current design only creates a single subscription. I think the important
nuance is that we're deciding how to handle a topic with no
subscriptions.

> they should create the topic and subscription manually or use the
> consumer to trigger the topic auto-creation, not the producer.

When producers create arbitrary topics, this design forces
the producer to create subscriptions, which is the same design for
this PIP. I think we should avoid producers creating subscriptions.

> It is not easy to determine consumer behavior on the producer side. But for
> DLQ, it's not a normal topic from the user's point of view

If we want to hold that the DLQ is not a normal topic, then I can see
why we would have a DLQ specific feature here.

Thanks,
Michael



On Sun, Jan 9, 2022 at 10:20 PM PengHui Li <peng...@apache.org> wrote:
>
> > I think we should consider adding a new policy for Pulsar topics: a
> namespace (or topic) policy that makes it possible to retain messages
> indefinitely when a topic has no subscriptions.
>
> It looks like a feature that supports retaining data while no subscriptions.
> With infinite data retention, the data will not be removed after all the
> subscriptions
> acked the message. But with “retain_data_no_subscriptions”, the data will
> be removed
> after all the subscriptions acked messages. But for the subsequent
> subscriptions,
> still can't retrieve all the data, so it looks like only guarantee the
> first subscription can
> retrieve all the data. If users want to guarantee all the subscriptions
> (all the existing and will create subscriptions),
> that is equivalent to infinite data retention.
>
> For the auto-created topic, the subscription can only be determined at the
> time of creation. It may or may not create. If users are able to determine
> which consumers are,
> and these consumers need to receive any message sent by the producer, they
> should
> create the topic and subscription manually or use the consumer to trigger
> the topic
> auto-creation, not the producer.
>
> It is not easy to determine consumer behavior on the producer side. But for
> DLQ,
> it's not a normal topic from the user's point of view, it's a local
> container for a subscription
> to store the messages that the consumer can't process.
> It's a "consumer determine consumer behavior", I think this is the most
> essential difference.
>
> Regards,
> Penghui
>
> On Sat, Jan 8, 2022 at 12:34 PM Michael Marshall <mikemars...@gmail.com>
> wrote:
>
> > Thanks for your response, Penghui.
> >
> > I support simplifying message loss prevention for DLQ topics. However,
> > it's not clear to me why we should only simplify it for DLQ topics.
> >
> > As a Pulsar user, I encountered many of the challenges you mention
> > when producing to auto created topics. In my architecture, I had
> > consumers reading from an input topic, transforming the data, and then
> > producing to an arbitrary number of output topics. My business logic
> > required that I not lose any messages, which is essentially the same
> > expectation from DLQ users here. I ended up increasing the retention
> > policy to about 4 hours on the output topics to minimize the possibility
> > of losing data. I had to scale up my bookkeeper cluster because of the
> > extra retention. If I had been able to ensure my auto created topic
> > would not delete messages before I created my subscriptions, I would
> > have had no retention policy and a smaller bookie cluster.
> >
> > > Yes, essentially, the DLQ is only a topic, no other specific behaviors.
> > > But the issue that the proposal wants to resolve is not to introduce a
> > > specific behavior for the DLQ topic or something
> >
> > I'm not sure this statement aligns with the PIP. It seems to me that
> > the PIP proposes solving the message loss issues by adding a DLQ
> > specific feature to the pulsar client.
> >
> > Earlier, I proposed expanding the CreateProducer command to be able to
> > create a subscription. This solution is not right: it tightly couples
> > producers and consumers, which we want to avoid.
> >
> > I think we should consider adding a new policy for Pulsar topics: a
> > namespace (or topic) policy that makes it possible to retain messages
> > indefinitely when a topic has no subscriptions.
> >
> > Our message retention feature is very valuable. However,
> > message retention doesn't solve the "slow to subscribe" consumer
> > problem. In the event of long network partitions, a consumer might not be
> > able to subscribe before messages are deleted. This feature
> > mitigates that risk and allows users to set message retention time
> > based on other needs, not based on calculations about how long it
> > could take to subscribe to a topic.
> >
> > This feature solves the DLQ message loss issue because the DLQ
> > producer can produce to any namespace, which is important for clusters
> > that do not have topic level policies enabled.
> >
> > Let me know what you think.
> >
> > Thanks,
> > Michael
> >
> > On Tue, Jan 4, 2022 at 10:33 PM PengHui Li <peng...@apache.org> wrote:
> > >
> > > Thanks for the great comments, Michael.
> > >
> > > Let me try to clarify some context about the issue that users encountered
> > > and the improvement that the proposal wants to Introduce.
> > >
> > > > Before we get further into the implementation, I'd like to discuss
> > > whether the current behavior is the expected behavior, as this is
> > > the key motivation for this feature.
> > >
> > > The DLQ can generate dynamically and users might have short
> > > data retention for a namespace by time or by size. But the messages
> > > in the DLQ usually compensate afterward, and we should allow users
> > > to keep the data in the DLQ only if they want to delete them manually.
> > >
> > > The DLQ is always for a subscriber, so a subscriber can use a init name
> > > to achieve the purpose of not being cleaned up from the DLQ.
> > >
> > > So the key point for this proposal is to keep data in the lazy created
> > DLQ
> > > topic until users wants to delete them manually.
> > >
> > > > I think the DLQ's current behavior is the expected behavior because
> > > the DLQ is only a topic and topics lose messages unless they have a
> > > subscription or a retention policy.
> > >
> > > Yes, essentially, the DLQ is only a topic, no other specific behaviors.
> > > But the issue that the proposal wants to resolve is not to introduce a
> > > specific
> > > behavior for the DLQ topic or something. It is just from the perspective
> > of
> > > the DLQ use case,
> > > Convenient for users to keep data in DLQ.
> > >
> > > Without this option, we are not easy to support setting a subscription or
> > > data retention
> > > policy for a lazy created DLQ topic.
> > >
> > > > I admit that it is not necessarily a nice default behavior to
> > > potentially lose messages, but this is the design for all topics.
> > > Based on the current design, an admin can create a retention policy
> > > for the topic or namespace. Then, consumers of the
> > > topic have the duration of the retention policy to discover the topic
> > > and create a subscription before messages are lost. Is there a reason
> > > this solution doesn't work for the DLQ topic?
> > >
> > > The difference here is when the subscriber subscribes to the topic.
> > > For a normal topic, the expected behavior is the subscriber able to read
> > all
> > > messages of the topic. It can start consuming for the earliest or latest
> > or
> > > any other
> > > valid positions. But for the DLQ, contains part of the original data for
> > a
> > > subscription.
> > > Users always don't expect to miss some head messages in the DLQ.
> > Otherwise,
> > > You will get 1,2,3 first, and 4,5 to DLQ and continue to receive 6,7, but
> > > 4,5 might
> > > removed by pulsar automatically by Pulsar.
> > >
> > > The current solution does not work well for DLQ topic is users not easy
> > to
> > > set a different
> > > data retention policy or create a new subscription for a lazy created DLQ
> > > topic.
> > >
> > > > As an aside, I wonder if topic discoverability is part of the problem
> > > here. It would be extremely valuable to get notifications any
> > > time a topic is created. That would allow users to move away from
> > > polling for current topic names towards a more reactive design.
> > >
> > > The notification is a good idea, for this case, the notification will
> > have
> > > some drawbacks:
> > >
> > >    1. The delayed notification might not allow us to achieve the purpose
> > >    2. The complexity will increase, auth for the notifications, users
> > need
> > >    to handle the events
> > >
> > > But the notifications can help in lots of parts such as improving
> > > observability, etc.
> > >
> > > Regards,
> > > Penghui
> > >
> > > On Tue, Jan 4, 2022 at 2:41 PM Michael Marshall <mmarsh...@apache.org>
> > > wrote:
> > >
> > > > Before we get further into the implementation, I'd like to discuss
> > > > whether the current behavior is the expected behavior, as this is
> > > > the key motivation for this feature.
> > > >
> > > > I think the DLQ's current behavior is the expected behavior because
> > > > the DLQ is only a topic and topics lose messages unless they have a
> > > > subscription or a retention policy.
> > > >
> > > > I admit that it is not necessarily a nice default behavior to
> > > > potentially lose messages, but this is the design for all topics.
> > > > Based on the current design, an admin can create a retention policy
> > > > for the topic or namespace. Then, consumers of the
> > > > topic have the duration of the retention policy to discover the topic
> > > > and create a subscription before messages are lost. Is there a reason
> > > > this solution doesn't work for the DLQ topic?
> > > >
> > > > Perhaps the disconnect here is that users of the DLQ feature do not
> > > > view the DLQ as only a Pulsar topic. I look forward to your thoughts.
> > > >
> > > > As an aside, I wonder if topic discoverability is part of the problem
> > > > here. It would be extremely valuable to get notifications any
> > > > time a topic is created. That would allow users to move away from
> > > > polling for current topic names towards a more reactive design.
> > > >
> > > > Thanks,
> > > > Michael
> > > >
> > > >
> > > > On Tue, Dec 28, 2021 at 7:59 PM Zike Yang
> > > > <zky...@streamnative.io.invalid> wrote:
> > > > >
> > > > > > Oh, that's a very interesting point. I think it'd be easy to add
> > that
> > > > > > as "internal" feature, though I'm a bit puzzled on how to add that
> > to
> > > > > > the producer API
> > > > >
> > > > > I think we can add a field `String initialSubscriptionName` to the
> > > > > Producer Configuration. And add a new field `optional string
> > > > > initial_subscription_name` to the `CommnadProducer`.
> > > > > When the Broker handles the CommandProducer, if it checks that the
> > > > > initialSubscriptionName is not empty or null, it will use
> > > > > initialSubscriptionName to create a subscription on that topic. When
> > > > > creating the deadLetterProducer or retryLetterProducer, we can
> > specify
> > > > > and create the initial subscription directly through the Producer.
> > > > > What do you think?
> > > > >
> > > > > On Thu, Dec 23, 2021 at 7:42 AM Matteo Merli <matteo.me...@gmail.com
> > >
> > > > wrote:
> > > > > >
> > > > > > > What if we extended the `CommandProducer` command to add a
> > > > > > > `create_subscription` field? Then, any time a topic is auto
> > > > > > > created and this field is true, the broker would auto create a
> > > > > > > subscription. There are some details to work out, but I think
> > this
> > > > > > > feature would fulfill the needs of this PIP and would also be
> > broadly
> > > > > > > useful for many client applications that dynamically create
> > topics.
> > > > > >
> > > > > > Oh, that's a very interesting point. I think it'd be easy to add
> > that
> > > > > > as "internal" feature, though I'm a bit puzzled on how to add that
> > to
> > > > > > the producer API
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Zike Yang
> > > >
> >

Re: [DISCUSSION] PIP-124: Create init subscription before sending message to DLQ

Reply via email to