[GitHub] [pulsar] harissecic added a comment to the discussion: Support for long running message consumer

GitBox Mon, 14 Nov 2022 00:35:28 -0800


GitHub user harissecic added a comment to the discussion: Support for long 
running message consumer


> There is still no good solution for retrying long running jobs.

I think there's plenty of _good enough_ workarounds but I agree there should be 
one optimal for long running consumers out-of-the-box. Just to list a few:
1. Using message properties on consumers with `reconsumeLater` - not sure which 
version starts to support this feature but adding properties to the message 
like `processing=true` and later `isDone=true` would require just a little 
extra code to check these properties before even trying to consume the message. 
If done is set to true simply ack message and move to the next.
2. Using readers with similar approach where message metadata/properties are 
read. In some cases consumers are not needed and using reader is a bit more 
simpler but in others we do really want the consumer - so not really a 
workaround in context of this case.
3. Combining DLQ with `negAck` and later processing DLQ with extra custom code 
to check if something was done already. Putting max redelivery to 1 would make 
message automatically on the next retry going directly to DLQ after timeout. 
This of course would require local concurrent cache where you keep processing 
ID-s in runtime memory and check them on message arrivals so you can simply 
negAck message if it's still processing. This way after processing actual 
message consumer can trigger "removing" message from DLQ. This would support 
both ackTimout and manually handling timeouts.
4. Trying to cache everything in DB or such and looking for messageIds, started 
processing time, allowed timeouts, ... Upon receiving message check this list 
and determine whether the message is being processed still or failed and this 
was a consumer restart.

I assume some kind of 3 would be good to have out-of-the-box. Best of course 
would be to have something like LRQ (long running queue for the lack of 
creativity from my side) where upon retry of ackTimeout consumer has the option 
to send back the message to broker like 'still processing' and it moves message 
to this queue and have Pulsar track if TCP dies, push them back to normal queue 
and retry, if TCP is alive let consumer tell when this message should be 
removed. Using DLQ for this is also possible but confuses messages that where 
retired too much and the ones that consumer is aware take too long.

GitHub link: 
https://github.com/apache/pulsar/discussions/18456#discussioncomment-4133185

----
This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org

[GitHub] [pulsar] harissecic added a comment to the discussion: Support for long running message consumer

Reply via email to