Hello Team, Thanks for pointing these things out.
First thing is to consider that this addon feature is for the message which needs to go to DLQ after a retryLater reaches the maximum limit of consumer level retries. It allows an additional maximum limit if a message has one at the time of producing. There are two types of retry in current architecture which consumers can go with. 1. Process later with an NACK backoff policy: In this case if someone wants to retry and wants to set a maximum retry at message processor level. They can get the number of times the message has been retried so far by calling the `reconsumeTimes` function. they will match the number with their custom retry. Now there are two things that will happen. Either they will call ACK that particular message or they have to call the `reconsumeLater` function based on system requirement. If they call ACK then the message is marked as delete in the ledger. And other cases would be explained in point 2. 2. By calling reconsumeLater: In some real cases if someone is calling reconsumeLater with a delay like they want to process a message again but after a particular time. so the new message gets created in the consumer RETRY topic which was again processed by the consumer (they may call the reconsumeLater again if the system requires based on use cases.). Now if the message reaches its retry limit which was configured at consumer level they will be sent to the DLQ message. So my proposal will not make any changes to 1st scenario which is to use NACK backoff for retrying the message and process. Consumers can use the same mechanism as they are doing now. It adds an additional feature to pulsar in which if a producer knows how many times an event should be processed a max apart from the consumer max retry and send to DLQ if reaches its limit. Then the above change will check if the limit is reached for reconsume later then it will send to DLQ even if the consumer has higher reconsme later retry limit. if producers do not pass the max retry limit it will obey the consumer retry limit. Also in this PIP we are not removing the consumer own retry limit it would be there. but get overridden by message limit if passed by the producer. if a consumer wants to re-override that limit ( temporarily change in the retry limit) which also can be achieved by sending a new max retry as custom property to the `reconsumeLater` function. So let me rephrase the subject for this pip is to `*MAX RECONSUME TIMES per Message*`. Michael's point about adding relation between producer and consumer? Yeah, It added a relation b/w them but it would be loose coupling. Because producers can send the max retry limit in a message which can be overridden by the consumer if needed. so consumers do not have to obey the limit if needed. Why I prefer message properties over `MessageMetadata` is because if a consumer wants to override the max limit (temporarily or permanent) then it would be difficult to update if required. I hope the above content answers most of the doubts about this PIP. On Tue, Jan 10, 2023 at 5:59 PM r...@apache.org <ranxiaolong...@gmail.com> wrote: > Thanks for submitting this PIP, Nitin Goyal. > > Seeing that you have submitted a related pull request in the Go SDK > community, I am sorry that I made the request changes first. > > For the retry processing of a single message, the methods currently > provided are: > > -ReconsumeLater > -Nack > > In actual usage scenarios, it may be a more elegant way to retry if we > verify Nack multiple times. The current implementation of ReconsumeLater > relies on delayed messages. This is not an elegant way, and there is no way > to record the number of retries. > > In order to support backoff retries, we introduced the NackBackoff > strategy, which itself is an interface and exposed to users. Based on this > interface, we can do more customized things. If the NackBackoff interface > can't meet our current needs, I prefer that we implement more parameters in > the NackBackoff strategy instead of continuing to add new functional logic > in ReconsumeLater. > > -- > Thanks > Xiaolong Ran > > Enrico Olivelli <eolive...@gmail.com> 于2023年1月10日周二 16:23写道: > > > I think that Michael's point is very important. > > > > Producers and Consumers are decoupled and this PIP would introduce > > a new concept. > > Also the same Message can be consumed using multiple subscriptions > > (typically Applications) > > and all the applications will process the message in a different way > > (by definition). > > > > Isn't it possible to implement what you want with an Interceptor or > > with some custom handling for the DLQ ? > > The DLQ (currently) in Pulsar is mostly a client-side feature. > > The producer can tag the messages with a message property and the > > client can decide what to do > > and how many times to retry the message or give up. > > > > Enrico > > > > Il giorno mar 10 gen 2023 alle ore 01:00 Michael Marshall > > <mmarsh...@apache.org> ha scritto: > > > > > > Thanks for submitting this PIP, Nitin Goyal. > > > > > > At the heart of this PIP is an assumption about the relationship > > > between producers and consumers. The PIP assumes a producer knows how > > > many times a consumer should attempt to consume a message before > > > giving up and sending it to the DLQ. Does anyone have strong opinions > > > on the boundaries between producers and consumers in relation to this > > > PIP? > > > > > > This PIP expands the relationship between producer and consumer by > > > letting the producer tell the consumer's pulsar client when to send a > > > message to the DLQ, and as such, we should be very intentional about > > > accepting this PIP. > > > > > > Because a user can easily implement this feature on their own and > > > because it tightly couples producers and consumers, I think we should > > > not move forward with this PIP. I am open to discussion, though. > > > > > > > for instance in case of major system failure you don't want to lose > > > > all your messages. > > > > > > For what it's worth, the retry letter topic feature, which this PIP > > > relies on, sends all messages to the DLQ, so this feature does not > > > introduce conditions for message loss. > > > > > > As an aside, if we move forward with this feature, we need to make > > > sure that the protocol documentation is updated and we should consider > > > putting this field in the `MessageMetadata` protobuf object, not in a > > > properties map. > > > > > > Thanks, > > > Michael > > > > > > > > > > > > On Mon, Jan 9, 2023 at 9:47 AM Nitin Goyal <nitin.goyal....@gmail.com> > > wrote: > > > > > > > > Hello Enrico, > > > > > > > > For your concern about temporarily increasing of retry. It can be > > achieved > > > > using overriding msg property while calling reconsuming later with > > custom > > > > properties.. > > > > > > > > About msg immutable as per current design if consumer call > > reconsumeLater > > > > function it creates a new msg in the system adding few properties to > it > > > > like how many times it get consumed. That also allow other custom > > > > properties to be added in newly generated msg.. so if msg needs > > temporary > > > > high retry or change in retry count on msg they can override it using > > > > custom properties… > > > > > > > > Thanks > > > > Nitin Goyal > > > > > > > > > > > > On Mon, 9 Jan 2023 at 1:11 PM, Enrico Olivelli <eolive...@gmail.com> > > wrote: > > > > > > > > > I don't think that this is a good idea. > > > > > > > > > > Because IIUC we want to add a property per message that sets the > > > > > maximum time of retries. > > > > > > > > > > Unfortunately in a real system sometimes you have to change the > > number > > > > > of retries temporarily, > > > > > for instance in case of major system failure you don't want to lose > > > > > all your messages. > > > > > > > > > > If we allow you to set a property on a message then you won't be > able > > > > > to change it because the message is immutable. > > > > > > > > > > TTL (time to live) is a similar concept but it is related to the > > > > > concept of "physical time", if you have message that represents > > > > > a task to be executed within a given deadline then it makes sense > to > > > > > state it in the message metadata. > > > > > > > > > > But the "number of retries" depends on how you deal with the > retries: > > > > > - how much time do you wait ? > > > > > - how often do you have a temporary failure to retry ? > > > > > > > > > > It would make more sense to have a QOS (quality of service) > attribute > > > > > on the message, like "important/"non-important"/"foo"/"bar" > > > > > and have a way for the brokers and the clients to handle that. I am > > > > > pretty sure that with interceptors you can already do something. > > > > > > > > > > > > > > > I am against hard coding the behaviour described in the PIP (and I > > > > > voted -1 in the VOTE thread) > > > > > > > > > > Enrico > > > > > > > > > > Il giorno ven 6 gen 2023 alle ore 09:11 Zike Yang <z...@apache.org > > > > ha > > > > > scritto: > > > > > > > > > > > > Hi, > > > > > > > > > > > > This looks good to me. > > > > > > +1 > > > > > > > > > > > > I was thinking if we could add a new API for `reconsumeLater` to > > let > > > > > > users set the max retry times easily. But I saw that there are > too > > > > > > many reconsumeLater API and this will make the consumer more > > complex. > > > > > > > > > > > > Thanks, > > > > > > Zike Yang > > > > > > > > > > > > > > > > > > On Thu, Jan 5, 2023 at 3:58 PM Zixuan Liu <node...@gmail.com> > > wrote: > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > Thanks, > > > > > > > Zixuan > > > > > > > > > > > > > > Nitin Goyal <nitin.goyal....@gmail.com> 于2023年1月5日周四 13:50写道: > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > I made a PIP to discuss: > > > > > https://github.com/apache/pulsar/issues/19136 > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Nitin Goyal > > > > > > > > > > > > > > > > > -- > > > > Regards > > > > Nitin Goyal > > > -- Regards Nitin Goyal