tbain98 wrote > I'm not clear on what behavior you're seeing, because the descriptions you > give (as I understand them) seem contradictory. You say that the consumer > won't abort, but that you've got a 30-minute client-side abort timeout. > You say that after the intended abort, you know it didn't work because the > consumer didn't resume processing messages, but then you say that there > weren't any messages to process. Maybe you're describing multiple > independent scenarios with different behavior and I'm just not catching > the > difference between them, but I'm not at all clear on what you're seeing. > Can you give us a from-the-top summary? No need to give the overview or > any config files or log files, just tell us at each step what you expect > to > happen and what's actually happening (and how you know). > > Also, your first message was all about aborting slow consumers, while your > reply sounds like it's concerned entirely with aborting idle consumers. > Which one's the problem here? Also, how do you know that a particular > idle > consumer isn't being aborted? The logs tell you the abort is happening; > what's telling you it's not?
Hi Tim, I am sorry for my unclear description. I have mixed two (or more) issues in one post. * First issue* was visible with this config (30 minutes timeout on MessageListener due to tx timeout, prefetch size: default 1000, none of “abortSlow*ConsumerStrategy” defined, redelivery policy defined on the jms connection <not on jms factory> - one redelivery). It looks like the JMS processing has stopped at one point: consumer got a HEAVY message, it failed at least once to consume it, message was still visible via ActiveMQ Web App (or JMX), consumer had 1 message to be acked (I saw this via JMX). After one day this very consumer hasn’t done anything at all - it was idle, it didn’t get any more messages dispatched to it while the other consumer got a lot of them. I was forced to move the message to DLQ and to restart the application node. Similar behaviour I have got with prefetch.size set to 0. That is when I started to look at abort strategies, AbortSlowAckedConsumerStrategy looked like the one to use. And after I have configured it I got the other issue with a new configuration. *Second issue:* conf (30 minutes timeout on MessageListener due to tx timeout, AbortSlowAckConsumerStrategy set to abort every consumer after every 6 minutes abortConnection=false), I had 1 HEAVY message in the queue, and it failed due to tx timeout. Now as I understand the AbortSlowAckConsumerStrategy should abort the consumer and create a new one which should try to consume the message after some time (100 seconds due to redelivery policy), but what happened that day was that consumer stayed alive with 1 message to be acked back. And the consumer was being kept alive indefinitely (the same consumer id + logs I have posted). I was forced to move this message to the DLQ, but the consumer was still slow, idle with no. of messages to be acked = 1. After some time I was forced to restart application node to get new consumer in place. As I understand aborting correctly, after consumer has been marked as slow and it did finish it’s job (even if it was unsuccessful: rollback) it should be removed and replaced by a different one. *(This behavior is observable for “smaller” messages).* So in the logs I should see a different consumer id after some time, but in this case it didn’t happen (without restarting server). 2014-10-25 00:00:11,455 [host] Scheduler] INFO AbortSlowConsumerStrategy - aborting slow consumer: ID:min-p-app02.osl.basefarm.net-36433-1414153506788-1:1:17:7 for destination:queue://generateReportQueue … more logs every 6 minutes here 2014-10-25 01:12:11,641 [host] Scheduler] INFO AbortSlowConsumerStrategy - aborting slow consumer: ID:min-p-app02.osl.basefarm.net-36433-1414153506788-1:1:17:7 for destination:queue://generateReportQueue *Both issues looks kind of the same for me now: consumer hasn’t acked back to the server after it failed to consume this HEAVY message (one or more times) and it stopped to be responsive (no acked sent back, abort strategy couldn’t force it to stop, no new messages dispatched to the consumer). * tbain98 wrote > 1. If I've understood correctly, you say your business logic will abort > after 30 minutes, independently of any ActiveMQ-initiated abort request. > Is that actually happening? The logs you've posted don't give any > indication either way (and you say "the same idle consumer can’t be > aborted > in a span of 18 hours"), and the behavior you're describing would be more > consistent with your clients not aborting than with them aborting but not > pulling the next message, though of course both are possible. So make > sure > your client's really doing what you think it is. Yes, that is really happening, in the application logs I can see that the transaction is stopped, and then code which listens to the exceptions in all MessageListeners sends information about the exception to the devs (custom error handler in DefaultMessageListenerContainer). Usually just after getting this exception consumer starts to consume the next message or waits until redelivery policy kicks in to try to consume the message again. *As you mentioned it, I will try to verify what is actually happening under the hood in the DMLC (we have done some small extensions to it in order to define redelivery policy per connection, and custom error handling). Maybe when processing is too heavy the rollback isn’t called?* tbain98 wrote > 3. Can you confirm (via JConsole in the MBeans tab or some other JMX > viewer) that your consumer is still connected to the broker after the > abort? Also, when your client aborts, how is ActiveMQ being told about > the > failure? (And what ack mode are you using?) Via JMX I saw that consumer was still connected to the server We use session transacted (without transaction manager but with some Spring JMS magic to handle commit/rollback correctly). “Also, when your client aborts, how is ActiveMQ being told about the failure?” I am not sure if I understood your question. tbain98 wrote > 5 & 6. For you to use the approach I suggested, you'd either have to be > OK > losing messages when failures occur or you'd have to persist the message > to > a datastore to retry in the case of a failure. It sounds like neither of > those is appealing, so this may not be an option. We use persistent store in ActiveMQ, and I hoped this should be enough. I am trying to track down the bug in the configuration, to avoid unnecessary over complicated store code/configuration. Thanks again for your help Tim. I will try to see what’s under the hood in the Spring DMLC when heavy message has failed due to transaction timeout and if upgrading to 5.10.0 solves our issues. I will write when I am done with these two. Regards Marek Dominiak -- View this message in context: http://activemq.2283324.n4.nabble.com/Not-abortable-slow-consumers-stopped-processing-of-messages-in-a-queue-tp4686721p4686838.html Sent from the ActiveMQ - User mailing list archive at Nabble.com.