Hi Timothy, I just synced that change and verified that it fixes the issue in my test setup.
Thanks for the quick turnaround! __ -Timo On 8/12/20, 8:38 AM, "Timothy Bish" <tabish...@gmail.com> wrote: On 8/10/20 5:05 PM, Timo Naroska wrote: > Hi, > > we're seeing sporadic cases in our JMS application where the AmqpTransferTagGenerator used inside JMS MessageProducer is sending duplicate delivery-tags. This leads to the producer being closed and unusable for any further operations. > > Setup: > QPID-J 7.16 broker cluster of 3 using BDB HA replication > qpid jms client 0.53.0 > org.messaginghub/pooled-jms connection pool 1.1.1 > > Our JMS clients are using the messaginghub JMS pool for connection management. We're also setting a 5 second timeout for send operations (org.apache.qpid.jms.JmsConnectionFactory#setSendTimeout). In case of send timeout we consider the broker temporarily offline and will retry the send later in our client code. > > Having the JMS pool manage the connections, a send operation in the client is as follows: > - get a JMS connection from the pool > - create session and producer (these are cached by the pool implementation) > - send message > - close producer/session/connection (this close operation does *not* really close anything. the resources are simply released into the messaginghub JMS pool) > > The promise of the JMS pool is to completely take care of the connection lifecycle and to provide some simple load balancing between connections. > > > Problem: > We're seeing sporadic cases where our client goes into an unrecoverable state after a send timeout. The observed symptoms are: > 1. a message send fails due to timeout > 2. subsequent send operation fails with the broker reporting: > Delivery-tag '\x00' is used by another unsettled delivery. The delivery-tag MUST be unique amongst all deliveries that could be considered unsettled by either end of the link. [condition = amqp:illegal-state] > 3. *all* subsequent send operations fail with: > javax.jms.IllegalStateException: The MessageProducer is closed > > > It is not completely clear what causes the initial send timeout. We suppose some kind of network blip in either client<->broker communication or broker<->broker replication traffic. Having some send operations fail for such temporary blips is OK. Our system can handle that. > However, the worse issue is that the client (specifically the JMS MessageProducer) sends a duplicate delivery-tag to the broker and then gets shut down by the resulting amqp:illegat-state response. > > Basically we would need to create a fresh MessageProducer. The problem is that we can't. The producer is managed by the JMS pool. Our code has no direct access to it and no way to create a new one. The JMS pool on the other hand has no way to know that the producer has gone bad. The JMS API provides no way to check validity of JMS objects. Therefore the pool also can't recycle the MessageProducer. Our client is then stuck with the closed producer, requiring an application restart. > > > > I was able to reproduce and debug the behavior locally using a short send timeout and a TCP proxy (toxiproxy) to simulate erratic network latency. The problem can be tracked down to the AmqpTransferTagGenerator in the JMS client. This class generates the (supposedly unique) delivery-tags for each send operation. > By default it is using a pool of tags that get re-used. i.e when a message gets sent to the broker, the next free tag is retrieved from the pool and used as delivery-tag. Once the broker acknowledges the operation, that tag is put back in the pool. This seems to be an optimization to reduce memory allocations?? > > Problem: the tag is also returned to the pool when a send timeout occurs(!) even though there is no telling what caused the timeout. > In our case the broker is just slower than usual, but it is still processing the message. > > Simplified steps: > 1. client is about to send a message to the broker; AmqpTransferTagGenerator returns tag x00 from its pool > 2. broker receives message with tag x00 and starts processing > 3. network wonkiness causes the broker to take longer than usual > 4. client runs into send timeout without an ACK from the broker; tag 0x00 is returned to AmqpTransferTagGenerator > 5. subsequent message is about to be sent to the broker; AmqpTransferTagGenerator again returns tag x00 which was just returned to its pool > 6. broker receives second message with tag x00 while *still* operating on the original message; returns amqp:illegal-state > 7. client receives broker error. closes the JMS producer > 8. messaginghub JMS pool has no way to tell the producer has gone bad and holds on to it forever... > > > Proposed change: > > AmqpTransferTagGenerator has an option to disable the internal tag pool. In that mode a new tag is allocated for each operation and that seems to solve this issue. Currently there is no way to configure this externally though. To test this change, I had to modify the JMS client code locally and built a custom jar of it. It is a single line change when creating the AmqpTransferTagGenerator here: > https://github.com/apache/qpid-jms/blob/f212e7c2c3e21d44e90068f9c0bf6ab374a68f8e/qpid-jms-client/src/main/java/org/apache/qpid/jms/provider/amqp/AmqpFixedProducer.java#L62 > > > I would like to get some feedback on this from the mailing list though. > > Does anybody have more insight into AmqpTransferTagGenerator? Is the tag pool in there really just an optimization? > Should this be made configurable or just disabled in general? > > > Also, is anything wrong with our usage of the messaginghub JMS pool? > > > Thanks > Timo > > A fix for the client has been submitted that stops the recycling of tags when a send fails which will be in the next client release. https://issues.apache.org/jira/browse/QPIDJMS-515 -- Tim Bish --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org For additional commands, e-mail: users-h...@qpid.apache.org