GitHub user harissecic added a comment to the discussion: Support for long running message consumer
> There is still no good solution for retrying long running jobs. I think there's plenty of _good enough_ workarounds but I agree there should be one optimal for long running consumers out-of-the-box. Just to list a few: 1. Using message properties on consumers with `reconsumeLater` - not sure which version starts to support this feature but adding properties to the message like `processing=true` and later `isDone=true` would require just a little extra code to check these properties before even trying to consume the message. If done is set to true simply ack message and move to the next. 2. Using readers with similar approach where message metadata/properties are read. In some cases consumers are not needed and using reader is a bit more simpler but in others we do really want the consumer - so not really a workaround in context of this case. 3. Combining DLQ with `negAck` and later processing DLQ with extra custom code to check if something was done already. Putting max redelivery to 1 would make message automatically on the next retry going directly to DLQ after timeout. This of course would require local concurrent cache where you keep processing ID-s in runtime memory and check them on message arrivals so you can simply negAck message if it's still processing. This way after processing actual message consumer can trigger "removing" message from DLQ. This would support both ackTimout and manually handling timeouts. 4. Trying to cache everything in DB or such and looking for messageIds, started processing time, allowed timeouts, ... Upon receiving message check this list and determine whether the message is being processed still or failed and this was a consumer restart. I assume some kind of 3 would be good to have out-of-the-box. Best of course would be to have something like LRQ (long running queue for the lack of creativity from my side) where upon retry of ackTimeout consumer has the option to send back the message to broker like 'still processing' and it moves message to this queue and have Pulsar track if TCP dies, push them back to normal queue and retry, if TCP is alive let consumer tell when this message should be removed. Using DLQ for this is also possible but confuses messages that where retired too much and the ones that consumer is aware take too long. GitHub link: https://github.com/apache/pulsar/discussions/18456#discussioncomment-4133185 ---- This is an automatically sent email for dev@pulsar.apache.org. To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org