Re: DLQ, cause:null

Tim Bain Mon, 27 Apr 2015 09:34:41 -0700

On Mon, Apr 27, 2015 at 7:30 AM, James Green <james.mk.gr...@gmail.com>
wrote:


> See in-line.
>
> On 26 April 2015 at 05:37, Tim Bain <tb...@alumni.duke.edu> wrote:
>
> > James,
> >
> > The prefetch buffer is a buffer of messages in the ActiveMQ code in the
> > client process that holds messages that have been dispatched from the
> > broker to the client but that haven't yet been handed over to the client.
> >
>
> To clarify the location of this pre-fetch buffer, is it inside the broker
> itself? Allocated to the client that is connected but not to individual
> consumers?


No, it's in the consumer process, so the messages are available to be
consumed from memory when needed without requiring a network read.  But a
copy of the message will exist in both the broker and the consumer (in case
the consumer crashes without acking the message, so the broker can
redeliver it), and once the consumer acks the message it will be removed
from the broker.


> > The purpose is to keep some number of messages in memory and available
> for
> > the consumer to handle, to ensure that the consumer never has to wait
> for a
> > message to be pulled from the broker.  This lets the consumer consume as
> > quickly as possible.  The broker will continue dispatching messages to
> the
> > client until the client has a full prefetch buffer, after which point it
> > will dispatch one message for every ack it gets back from the consumer.
> >
>
> Where "client" is in fact inside the broker at the network edge listening
> to the socket where the client with consumers is connected.


I was a little sloppy with my word choice, and I think it's mislead you.
As I used them in my response, the words "client" and "consumer" are
interchangeable, and mean "the process that is connecting to the broker for
the purpose of consuming messages".  More generally, the term "client"
means "producer or consumer", but since we're not discussing producers here
my use of the words was meant interchangeably.  I think you understood me
to be drawing a distinction between the two, when that wasn't my intent;
sorry for the confusion that caused.  So to answer your question, "client"
is not inside the broker, it's the process that connects to the broker.


> > When you have multiple concurrent consumers, the broker will round-robin
> > messages between consumer that don't have full prefetch buffers; it won't
> > batch up a full prefetch buffer's worth for Consumer 1 before sending any
> > messages to Consumers 2-4, which is what I think you were worried about.
> >
>
> I was more conceptually thinking of a pre-fetch buffer being inside the
> client that is connected to the broker - i.e. the messages have already
> gone across the network and are therefore instant delivery into consumers.
> But I think when you talk of "client" you actually mean a resource within
> the broker marshalling messages out to a session within which multiple
> consumers are present.


No, you're correct, though as I said above the messages are also in the
broker because they haven't been acked.


> > If there are a non-zero number of messages in the client's prefetch
> buffer
> > and receive() is called, the thread should simply go grab the first
> message
> > in the buffer and return it, so the timeout should not elapse unless the
> > broker's host is HEAVILY loaded or locking somehow delays thread
> execution
> > or you get unlucky enough to catch a full GC right then.  If there are no
> > messages in the buffer, the thread should wait for the timeout period to
> > see if a message shows up, and either return that message when it does or
> > return with no message after the timeout elapses.  In both scenarios, I
> > would expect that the consumer would remain connected and so no
> redelivery
> > would apply; receive() should just be looking to see whether a message
> came
> > across the pre-existing connection, but it should not be making
> connections
> > nor disconnecting if the timeout interval elapses without a message.  (If
> > you're disconnecting after each message, that's an anti-pattern as I
> > understand it, and you should probably rethink your approach.)
> >
>
> I think our machine itself gets busy. We know that connecting and sending
> to ActiveMQ via STOMP sometimes ends up with a socket wait over 10 seconds.
> Identifying why could be interesting.
>
>
> > All of that is to say, I don't think that the elapsing of a receive()
> > timeout without receiving a message should do anything that would cause a
> > message redelivery, so I wonder if that's a red herring and the problem
> is
> > actually something else.  Do you see any messages in your client or
> broker
> > logs indicating that the client disconnected and reconnected, that the
> > connection's inactivity monitor detected the connection to be inactive,
> or
> > that the consumer was aborted as a slow consumer?
> >
>
> I have not received any indications that the client is reconnecting.
>
> We have updated the receive time-out from 1000ms to 10000ms retaining the
> default 6 delivery attempts and so far nothing new has appeared in the DLQ.


If your client isn't reconnecting, then messages shouldn't be getting
redelivered unless your application itself is failing to process them (and
you're not auto-acking them).  Have you confirmed that your application
logic isn't failing to successfully process those messages?  That could
easily explain the behavior you're seeing.


> > BTW, for your update of that page on the wiki, what client connection
> > timeout did you have in mind?  (I can think of at least three things that
> > match that phrase: timeout on establishing the initial connection to the
> > broker, inactivity on the connection that leads to a disconnect, and the
> > receive() timeout you referenced above.)  I think that a disconnect due
> to
> > connection inactivity where the inactivity monitor was in use would
> indeed
> > produce a redelivery, but if you meant the elapsing of a receive()
> timeout
> > as described above, I'm not convinced that that's accurate (and if it
> turns
> > out not to be, we should pull that edit back off the wiki page to avoid
> > confusing people).  But one thing that I believe is missing from that
> list
> > is when a consumer disconnects from the broker (for any reason) where
> > messages have been dispatched to the consumer but not acknowledged (i.e.
> > they're in the consumer's prefetch buffer or they're the message the
> > consumer was processing at the time of the disconnect under certain
> > acknowledgement modes).
> >
>
> Surely if a client takes more than the receive time-out to process a
> message, a re-delivery will occur? If not, what does happen?


I don't believe the receive timeout relates to processing a message at
all.  The receive timeout is the amount of time you'll wait to see if a
message is available to be processed before returning control; it ends when
message processing begins, whereas your description indicates you're
expecting that it starts when message processing begins.

When a client takes more than the receive timeout to process a message, the
client will continue processing the message, and continue processing the
message, and continue processing the message, until eventually it
finishes.  The only way I know of to time out a client is by using the
AbortSlowAckConsumerStrategy (and even then, I'm not sure the consumer will
actually stop processing the current message, it just won't get to process
any more after the current one), but that's a completely different path
(and entirely optional, and not enabled by default).  By default, message
processing just takes as long as it takes, which is why we have strategies
available to react to slow consumers.


> That was the documentation update I had intended under what I thought was a
> very safe interpretation so far.


>
> >
> > Tim
> >
> > On Fri, Apr 24, 2015 at 8:10 AM, James Green <james.mk.gr...@gmail.com>
> > wrote:
> >
> > > I'm need to understand pre-fetch limit and receive time-out
> interaction.
> > >
> > > We have four concurrent consumers in our route. Do each receive the
> > > messages in batches of the pre-fetch limit?
> > >
> > > At what point does the receive time-out start and end?
> > >
> > > In our case each client performs a number of db queries then fires a
> new
> > > message at the broker before the route is complete. Typically this may
> > take
> > > more than 1 second under load. A 10s time-out only makes sense if the
> > > pre-fetching is not included but then that suggests client-calculated
> > > time-outs communicated back to the broker which also makes no sense.
> > >
> > > So, am clear we need to better understand what's under the hood here!
> > >
> > > James
> > >
> > > On 24 April 2015 at 14:39, Gary Tully <gary.tu...@gmail.com> wrote:
> > >
> > > > Tim, steady, I suggested it *may* be relevant :-)
> > > > With camel and transactions - ie: spring dmlc, connection pools and
> > > > cache levels - anything is possible w.r.t
> consumer/sessions/connection
> > > > state, because there are so many variables in the mix.
> > > >
> > > > With activemq and prefetch, every consumer disconnect will result in
> > > > redeliveries. The trick is figuring out whether
> > > > the prefetched messages were actually delivered to the consumer so
> the
> > > > delivery count can reflect the applications view
> > > > of the world, that is not an exact science.
> > > >
> > > >
> > > > On 24 April 2015 at 13:51, Tim Bain <tb...@alumni.duke.edu> wrote:
> > > > > Gary,
> > > > >
> > > > > If I understood that JIRA correctly, the bug only occurs when the
> > > client
> > > > > disconnects, which doesn't sound like what James is doing (nothing
> in
> > > his
> > > > > description indicated to me that his client wasn't staying up and
> > > > connected
> > > > > the whole time), so it doesn't sound like your fix would resolve
> (nor
> > > > > explain) his problem.  And although I'm all about workarounds when
> I
> > > know
> > > > > there's a fix in a future version, I'm not sure that's the case
> here
> > > and
> > > > I
> > > > > don't want to give him a workaround at the expense of actually
> > finding
> > > > and
> > > > > fixing a bug.
> > > > >
> > > > > The two things I know of that can cause message redelivery are 1)
> > > client
> > > > > disconnection with queues and durable topic subscriptions and 2)
> > > > unhandled
> > > > > exceptions in the client message handler code.  James, might #2 be
> > > going
> > > > on
> > > > > here?  And Gary (or anyone else), are there any other possible
> causes
> > > of
> > > > > redelivery that I don't know about?
> > > > >
> > > > > Tim
> > > > > On Apr 24, 2015 4:59 AM, "Gary Tully" <gary.tu...@gmail.com>
> wrote:
> > > > >
> > > > >> to avoid the redelivered messages getting sent to the DLQ,
> changing
> > > > >> the default redelivery policy max from 6 to infinite will help.
> > > > >>
> > > > >> You can do this in the brokerurl passed to the jms connection
> > factory,
> > > > >> it may also make sense to reduce the prefetch if consumers come
> and
> > go
> > > > >> without consuming the prefetch, which seems to be the case.
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
> tcp://..:61616?jms.prefetchPolicy.all=100&jms.redeliveryPolicy.maximumRedeliveries=-1
> > > > >>
> > > > >> On 23 April 2015 at 17:14, James Green <james.mk.gr...@gmail.com>
> > > > wrote:
> > > > >> > Hi,
> > > > >> >
> > > > >> > We are not overriding so the defaults of 1s timeout on the
> > receive()
> > > > and
> > > > >> > 1,000 prefetch are in play.
> > > > >> >
> > > > >> > We are updating the connection URI to set a much higher timeout.
> > > > >> >
> > > > >> > Interestingly, PHP sending to the very same broker via STOMP
> gets
> > > > send()
> > > > >> > fail with a 2 second timeout specified. With a 10 second timeout
> > the
> > > > >> > frequency of this is reduced.
> > > > >> >
> > > > >> > I have fired up the latest hawt.io jar and connected to this
> > > broker,
> > > > >> > however the Health and Threads parts are entirely blank. The
> > queues
> > > > are
> > > > >> all
> > > > >> > visible yet "browse" of ActiveMQ.DLQ shows none of the 3,000+
> > > > accumulated
> > > > >> > messages. Wondering where to go next?
> > > > >> >
> > > > >> > Thanks,
> > > > >> >
> > > > >> > James
> > > > >> >
> > > > >> >
> > > > >> > On 23 April 2015 at 13:35, Gary Tully <gary.tu...@gmail.com>
> > wrote:
> > > > >> >
> > > > >> >> what sort of timeout is on the receive(...) from spring dmlc,
> and
> > > > what
> > > > >> >> is the prefetch for that consumer. It appears that the message
> is
> > > > >> >> getting dispatched but not consumed, the connection/consumer
> dies
> > > and
> > > > >> >> the message is flagged as a redelivery. then the before
> delivery
> > > > check
> > > > >> >> on the delivery counter kicks the message to the dlq. So this
> > must
> > > be
> > > > >> >> happening 6 times.
> > > > >> >>
> > > > >> >> I just pushed a tidy up of some of the redelivery semantics -
> > there
> > > > >> >> was a bug there that would cause the redelivery counter to
> > > increment
> > > > >> >> in error... so that may be relevant[1].
> > > > >> >> A short term solution would be to ensure infinite or a very
> large
> > > > >> >> number of redeliveries, up from the default 6. That can be
> > provided
> > > > in
> > > > >> >> the broker url.
> > > > >> >>
> > > > >> >> [1] https://issues.apache.org/jira/browse/AMQ-5735
> > > > >> >>
> > > > >> >> On 23 April 2015 at 13:08, James Green <
> james.mk.gr...@gmail.com
> > >
> > > > >> wrote:
> > > > >> >> > We have a camel route consuming from ActiveMQ (5.10.0 with
> > > KahaDB)
> > > > and
> > > > >> >> > frequently get a DLQ entry without anything logged through
> our
> > > > >> >> errorHandler.
> > > > >> >> >
> > > > >> >> > The only thing we have to go on is a dlqFailureCause header
> > which
> > > > >> says:
> > > > >> >> >
> > > > >> >> > java.lang.Throwable: Exceeded redelivery policy
> > > > limit:RedeliveryPolicy
> > > > >> >> > {destination = null, collisionAvoidanceFactor = 0.15,
> > > > >> >> maximumRedeliveries =
> > > > >> >> > 6, maximumRedeliveryDelay = -1, initialRedeliveryDelay =
> 1000,
> > > > >> >> > useCollisionAvoidance = false, useExponentialBackOff = false,
> > > > >> >> > backOffMultiplier = 5.0, redeliveryDelay = 1000}, cause:null
> > > > >> >> >
> > > > >> >> > These are happening apparently at random. The route is marked
> > > > >> transacted,
> > > > >> >> > and is backed by Spring Transactions itself backed by
> Narayana.
> > > > >> >> >
> > > > >> >> > Our debugging indicates that our route never receives the
> > message
> > > > from
> > > > >> >> AMQ
> > > > >> >> > prior to it hitting the DLQ. We have switched on DEBUG
> logging
> > > for
> > > > >> >> > org.apache.activemq but other than being swamped with even
> more
> > > > logs
> > > > >> >> we've
> > > > >> >> > observed nothing notable.
> > > > >> >> >
> > > > >> >> > Any ideas where to go from here? Impossible to say which of
> the
> > > > >> several
> > > > >> >> > thousand messages per day will go this way so an attached
> > > debugger
> > > > is
> > > > >> out
> > > > >> >> > of the question.
> > > > >> >> >
> > > > >> >> > Our log4j config fragment:
> > > > >> >> >
> > > > >> >> >         <Logger name="com" level="WARN"/>
> > > > >> >> >         <Logger name="org" level="WARN"/>
> > > > >> >> >         <Logger name="org.apache.camel" level="DEBUG"/>
> > > > >> >> >         <Logger name="org.apache.activemq" level="DEBUG"/>
> > > > >> >> >         <Logger name="org.springframework.orm.jpa"
> > > level="DEBUG"/>
> > > > >> >> >         <Logger name="org.springframework.transaction"
> > > > level="DEBUG"/>
> > > > >> >> >
> > > > >> >> > Thanks,
> > > > >> >> >
> > > > >> >> > James
> > > > >> >>
> > > > >>
> > > >
> > >
> >
>

Re: DLQ, cause:null

Reply via email to