Re: Not abortable slow consumers / stopped processing of messages in a queue

Marek Dominiak Fri, 31 Oct 2014 04:27:13 -0700

tbain98 wrote
> I'm not clear on what behavior you're seeing, because the descriptions you
> give (as I understand them) seem contradictory.  You say that the consumer
> won't abort, but that you've got a 30-minute client-side abort timeout.
> You say that after the intended abort, you know it didn't work because the
> consumer didn't resume processing messages, but then you say that there
> weren't any messages to process.  Maybe you're describing multiple
> independent scenarios with different behavior and I'm just not catching
> the
> difference between them, but I'm not at all clear on what you're seeing.
> Can you give us a from-the-top summary?  No need to give the overview or
> any config files or log files, just tell us at each step what you expect
> to
> happen and what's actually happening (and how you know).
> 
> Also, your first message was all about aborting slow consumers, while your
> reply sounds like it's concerned entirely with aborting idle consumers.
> Which one's the problem here?  Also, how do you know that a particular
> idle
> consumer isn't being aborted?  The logs tell you the abort is happening;
> what's telling you it's not?


Hi Tim, 

I am sorry for my unclear description. I have mixed two (or more) issues in
one post. 
*
First issue* was visible with this config (30 minutes timeout on
MessageListener due to tx timeout, prefetch size: default 1000, none of
“abortSlow*ConsumerStrategy” defined, redelivery policy defined on the jms
connection <not on jms factory> - one redelivery). It looks like the JMS
processing has stopped at one point: consumer got a HEAVY message, it failed
at least once to consume it, message was still visible via ActiveMQ Web App
(or JMX), consumer had 1 message to be acked (I saw this via JMX). After one
day this very consumer hasn’t done anything at all - it was idle, it didn’t
get any more messages dispatched to it while the other consumer got a lot of
them. I was forced to move the message to DLQ and to restart the application
node.

Similar behaviour I have got with prefetch.size set to 0. That is when I
started to look at abort strategies, AbortSlowAckedConsumerStrategy looked
like the one to use. And after I have configured it I got the other issue
with a new configuration.

*Second issue:* conf (30 minutes timeout on MessageListener due to tx
timeout, AbortSlowAckConsumerStrategy set to abort every consumer after
every 6 minutes abortConnection=false), I had 1 HEAVY message in the queue,
and it failed due to tx timeout. Now as I understand the
AbortSlowAckConsumerStrategy should abort the consumer and create a new one
which should try to consume the message after some time (100 seconds due to
redelivery policy), but what happened that day was that consumer stayed
alive with 1 message to be acked back. And the consumer was being kept alive
indefinitely (the same consumer id + logs I have posted). I was forced to
move this message to the DLQ, but the consumer was still slow, idle with no.
of messages to be acked = 1. After some time I was forced to restart
application node to get new consumer in place. 

As I understand aborting correctly, after consumer has been marked as slow
and it did finish it’s job (even if it was unsuccessful: rollback) it should
be removed and replaced by a different one. *(This behavior is observable
for “smaller” messages).*

So in the logs I should see a different consumer id after some time, but in
this case it didn’t happen (without restarting server).

2014-10-25 00:00:11,455 [host] Scheduler] INFO  AbortSlowConsumerStrategy     
- aborting slow consumer:
ID:min-p-app02.osl.basefarm.net-36433-1414153506788-1:1:17:7 for
destination:queue://generateReportQueue
… more logs every 6 minutes here
2014-10-25 01:12:11,641 [host] Scheduler] INFO  AbortSlowConsumerStrategy     
- aborting slow consumer:
ID:min-p-app02.osl.basefarm.net-36433-1414153506788-1:1:17:7 for
destination:queue://generateReportQueue 

*Both issues looks kind of the same for me now: consumer hasn’t acked back
to the server after it failed to consume this HEAVY message (one or more
times) and it stopped to be responsive (no acked sent back, abort strategy
couldn’t force it to stop, no new messages dispatched to the consumer). *



tbain98 wrote
> 1.  If I've understood correctly, you say your business logic will abort
> after 30 minutes, independently of any ActiveMQ-initiated abort request.
> Is that actually happening?  The logs you've posted don't give any
> indication either way (and you say "the same idle consumer can’t be
> aborted
> in a span of 18 hours"), and the behavior you're describing would be more
> consistent with your clients not aborting than with them aborting but not
> pulling the next message, though of course both are possible.  So make
> sure
> your client's really doing what you think it is.

Yes, that is really happening, in the application logs I can see that the
transaction is stopped, and then code which listens to the exceptions in all
MessageListeners sends information about the exception to the devs (custom
error handler in DefaultMessageListenerContainer). Usually just after
getting this exception consumer starts to consume the next message or waits
until redelivery policy kicks in to try to consume the message again.

*As you mentioned it, I will try to verify what is actually happening under
the hood in the DMLC (we have done some small extensions to it in order to
define redelivery policy per connection, and custom error handling). Maybe
when processing is too heavy the rollback isn’t called?*


tbain98 wrote
> 3.  Can you confirm (via JConsole in the MBeans tab or some other JMX
> viewer) that your consumer is still connected to the broker after the
> abort?  Also, when your client aborts, how is ActiveMQ being told about
> the
> failure?  (And what ack mode are you using?)

Via JMX I saw that consumer was still connected to the server 
We use session transacted (without transaction manager but with some Spring
JMS magic to handle commit/rollback correctly).

“Also, when your client aborts, how is ActiveMQ being told about the
failure?”
I am not sure if I understood your question.


tbain98 wrote
> 5 & 6.  For you to use the approach I suggested, you'd either have to be
> OK
> losing messages when failures occur or you'd have to persist the message
> to
> a datastore to retry in the case of a failure.  It sounds like neither of
> those is appealing, so this may not be an option.

We use persistent store in ActiveMQ, and I hoped this should be enough. I am
trying to track down the bug in the configuration, to avoid unnecessary over
complicated store code/configuration.

Thanks again for your help Tim.


I will try to see what’s under the hood in the Spring DMLC when heavy
message has failed due to transaction timeout and if upgrading to 5.10.0
solves our issues. I will write when I am done with these two.


Regards
Marek Dominiak



--
View this message in context: 
http://activemq.2283324.n4.nabble.com/Not-abortable-slow-consumers-stopped-processing-of-messages-in-a-queue-tp4686721p4686838.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Not abortable slow consumers / stopped processing of messages in a queue

Reply via email to