Re: [QPID-JMS] AmqpTransferTagGenerator is causing duplicate delivery-tag errors

Robbie Gemmell Wed, 12 Aug 2020 07:53:45 -0700

The client should probably discard the delivery tag associated with a
failed send when a timeout occurs rather than return it to the tag
pool, as it indeed wont know the other side isnt done with it, and
certainly it shouldnt be used immediately on the next sends. That or
kill the producer itself because of the situation.


Setting a 5 second timeout seems fairly aggressive for persistent sync
sends to a BDB HA setup. A send timeout doesn't necessarily indicate
the broker is down, as often as not it wont, so I'd suggest increasing
that somewhat. It could mean the client ran out of / had no credit
(e.g queue/vhost flow controlling) and got none to send anything
before the timeout, or it could mean it was fully sent but the ack
didn't come back quickly, or if you are using the failover bits it
might mean its still trying to reconnect (in concert with the other
things).  I assume you are de-duping if you are willing to resend the
same, given the prior one may have completed just fine if it actually
went.

On the pool, yet another reason they are annoying. The way JMS handles
validity is that methods throw if the item is closed. The only form of
notification is really the exception listener..which is not meant to
fire for exceptions that can be returned via a method call, and
generally results in killing the connection if firing for anything.

Aside: do you actually need the pool. Are long lived producers really
not an option?

Robbie

On Mon, 10 Aug 2020 at 22:05, Timo Naroska <tnaro...@adobe.com.invalid> wrote:
>
> Hi,
>
> we're seeing sporadic cases in our JMS application where the 
> AmqpTransferTagGenerator used inside JMS MessageProducer is sending duplicate 
> delivery-tags. This leads to the producer being closed and unusable for any 
> further operations.
>
> Setup:
> QPID-J 7.16 broker cluster of 3 using BDB HA replication
> qpid jms client 0.53.0
> org.messaginghub/pooled-jms connection pool 1.1.1
>
> Our JMS clients are using the messaginghub JMS pool for connection 
> management. We're also setting a 5 second timeout for send operations 
> (org.apache.qpid.jms.JmsConnectionFactory#setSendTimeout). In case of send 
> timeout we consider the broker temporarily offline and will retry the send 
> later in our client code.
>
> Having the JMS pool manage the connections, a send operation in the client is 
> as follows:
> - get a JMS connection from the pool
> - create session and producer (these are cached by the pool implementation)
> - send message
> - close producer/session/connection (this close operation does *not* really 
> close anything. the resources are simply released into the messaginghub JMS 
> pool)
>
> The promise of the JMS pool is to completely take care of the connection 
> lifecycle and to provide some simple load balancing between connections.
>
>
> Problem:
> We're seeing sporadic cases where our client goes into an unrecoverable state 
> after a send timeout. The observed symptoms are:
> 1. a message send fails due to timeout
> 2. subsequent send operation fails with the broker reporting:
>                 Delivery-tag '\x00' is used by another unsettled delivery. 
> The delivery-tag MUST be unique amongst all deliveries that could be 
> considered unsettled by either end of the link. [condition = 
> amqp:illegal-state]
> 3. *all* subsequent send operations fail with:
>                 javax.jms.IllegalStateException: The MessageProducer is closed
>
>
> It is not completely clear what causes the initial send timeout. We suppose 
> some kind of network blip in either client<->broker communication or 
> broker<->broker replication traffic. Having some send operations fail for 
> such temporary blips is OK. Our system can handle that.
> However, the worse issue is that the client (specifically the JMS 
> MessageProducer) sends a duplicate delivery-tag to the broker and then gets 
> shut down by the resulting amqp:illegat-state response.
>
> Basically we would need to create a fresh MessageProducer. The problem is 
> that we can't. The producer is managed by the JMS pool. Our code has no 
> direct access to it and no way to create a new one. The JMS pool on the other 
> hand has no way to know that the producer has gone bad. The JMS API provides 
> no way to check validity of JMS objects. Therefore the pool also can't 
> recycle the MessageProducer. Our client is then stuck with the closed 
> producer, requiring an application restart.
>
>
>
> I was able to reproduce and debug the behavior locally using a short send 
> timeout and a TCP proxy (toxiproxy) to simulate erratic network latency. The 
> problem can be tracked down to the AmqpTransferTagGenerator in the JMS 
> client. This class generates the (supposedly unique) delivery-tags for each 
> send operation.
> By default it is using a pool of tags that get re-used. i.e when a message 
> gets sent to the broker, the next free tag is retrieved from the pool and 
> used as delivery-tag. Once the broker acknowledges the operation, that tag is 
> put back in the pool. This seems to be an optimization to reduce memory 
> allocations??
>
> Problem: the tag is also returned to the pool when a send timeout occurs(!) 
> even though there is no telling what caused the timeout.
> In our case the broker is just slower than usual, but it is still processing 
> the message.
>
> Simplified steps:
> 1. client is about to send a message to the broker; AmqpTransferTagGenerator 
> returns tag x00 from its pool
> 2. broker receives message with tag x00 and starts processing
> 3. network wonkiness causes the broker to take longer than usual
> 4. client runs into send timeout without an ACK from the broker; tag 0x00 is 
> returned to AmqpTransferTagGenerator
> 5. subsequent message is about to be sent to the broker; 
> AmqpTransferTagGenerator again returns tag x00 which was just returned to its 
> pool
> 6. broker receives second message with tag x00 while *still* operating on the 
> original message; returns amqp:illegal-state
> 7. client receives broker error. closes the JMS producer
> 8. messaginghub JMS pool has no way to tell the producer has gone bad and 
> holds on to it forever...
>
>
> Proposed change:
>
> AmqpTransferTagGenerator has an option to disable the internal tag pool. In 
> that mode a new tag is allocated for each operation and that seems to solve 
> this issue. Currently there is no way to configure this externally though. To 
> test this change, I had to modify the JMS client code locally and built a 
> custom jar of it. It is a single line change when creating the 
> AmqpTransferTagGenerator here:
> https://github.com/apache/qpid-jms/blob/f212e7c2c3e21d44e90068f9c0bf6ab374a68f8e/qpid-jms-client/src/main/java/org/apache/qpid/jms/provider/amqp/AmqpFixedProducer.java#L62
>
>
> I would like to get some feedback on this from the mailing list though.
>
> Does anybody have more insight into AmqpTransferTagGenerator? Is the tag pool 
> in there really just an optimization?
> Should this be made configurable or just disabled in general?
>
>
> Also, is anything wrong with our usage of the messaginghub JMS pool?
>
>
> Thanks
> Timo
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org

Re: [QPID-JMS] AmqpTransferTagGenerator is causing duplicate delivery-tag errors

Reply via email to