Hi Timothy,

I just synced that change and verified that it fixes the issue in my test setup.

Thanks for the quick turnaround! __

-Timo

On 8/12/20, 8:38 AM, "Timothy Bish" <tabish...@gmail.com> wrote:

    On 8/10/20 5:05 PM, Timo Naroska wrote:
    > Hi,
    >
    > we're seeing sporadic cases in our JMS application where the 
AmqpTransferTagGenerator used inside JMS MessageProducer is sending duplicate 
delivery-tags. This leads to the producer being closed and unusable for any 
further operations.
    >
    > Setup:
    > QPID-J 7.16 broker cluster of 3 using BDB HA replication
    > qpid jms client 0.53.0
    > org.messaginghub/pooled-jms connection pool 1.1.1
    >
    > Our JMS clients are using the messaginghub JMS pool for connection 
management. We're also setting a 5 second timeout for send operations 
(org.apache.qpid.jms.JmsConnectionFactory#setSendTimeout). In case of send 
timeout we consider the broker temporarily offline and will retry the send 
later in our client code.
    >
    > Having the JMS pool manage the connections, a send operation in the 
client is as follows:
    > - get a JMS connection from the pool
    > - create session and producer (these are cached by the pool 
implementation)
    > - send message
    > - close producer/session/connection (this close operation does *not* 
really close anything. the resources are simply released into the messaginghub 
JMS pool)
    >
    > The promise of the JMS pool is to completely take care of the connection 
lifecycle and to provide some simple load balancing between connections.
    >
    >
    > Problem:
    > We're seeing sporadic cases where our client goes into an unrecoverable 
state after a send timeout. The observed symptoms are:
    > 1. a message send fails due to timeout
    > 2. subsequent send operation fails with the broker reporting:
    >                  Delivery-tag '\x00' is used by another unsettled 
delivery. The delivery-tag MUST be unique amongst all deliveries that could be 
considered unsettled by either end of the link. [condition = amqp:illegal-state]
    > 3. *all* subsequent send operations fail with:
    >                  javax.jms.IllegalStateException: The MessageProducer is 
closed
    >
    >
    > It is not completely clear what causes the initial send timeout. We 
suppose some kind of network blip in either client<->broker communication or 
broker<->broker replication traffic. Having some send operations fail for such 
temporary blips is OK. Our system can handle that.
    > However, the worse issue is that the client (specifically the JMS 
MessageProducer) sends a duplicate delivery-tag to the broker and then gets 
shut down by the resulting amqp:illegat-state response.
    >
    > Basically we would need to create a fresh MessageProducer. The problem is 
that we can't. The producer is managed by the JMS pool. Our code has no direct 
access to it and no way to create a new one. The JMS pool on the other hand has 
no way to know that the producer has gone bad. The JMS API provides no way to 
check validity of JMS objects. Therefore the pool also can't recycle the 
MessageProducer. Our client is then stuck with the closed producer, requiring 
an application restart.
    >
    >
    >
    > I was able to reproduce and debug the behavior locally using a short send 
timeout and a TCP proxy (toxiproxy) to simulate erratic network latency. The 
problem can be tracked down to the AmqpTransferTagGenerator in the JMS client. 
This class generates the (supposedly unique) delivery-tags for each send 
operation.
    > By default it is using a pool of tags that get re-used. i.e when a 
message gets sent to the broker, the next free tag is retrieved from the pool 
and used as delivery-tag. Once the broker acknowledges the operation, that tag 
is put back in the pool. This seems to be an optimization to reduce memory 
allocations??
    >
    > Problem: the tag is also returned to the pool when a send timeout 
occurs(!) even though there is no telling what caused the timeout.
    > In our case the broker is just slower than usual, but it is still 
processing the message.
    >
    > Simplified steps:
    > 1. client is about to send a message to the broker; 
AmqpTransferTagGenerator returns tag x00 from its pool
    > 2. broker receives message with tag x00 and starts processing
    > 3. network wonkiness causes the broker to take longer than usual
    > 4. client runs into send timeout without an ACK from the broker; tag 0x00 
is returned to AmqpTransferTagGenerator
    > 5. subsequent message is about to be sent to the broker; 
AmqpTransferTagGenerator again returns tag x00 which was just returned to its 
pool
    > 6. broker receives second message with tag x00 while *still* operating on 
the original message; returns amqp:illegal-state
    > 7. client receives broker error. closes the JMS producer
    > 8. messaginghub JMS pool has no way to tell the producer has gone bad and 
holds on to it forever...
    >
    >
    > Proposed change:
    >
    > AmqpTransferTagGenerator has an option to disable the internal tag pool. 
In that mode a new tag is allocated for each operation and that seems to solve 
this issue. Currently there is no way to configure this externally though. To 
test this change, I had to modify the JMS client code locally and built a 
custom jar of it. It is a single line change when creating the 
AmqpTransferTagGenerator here:
    > 
https://github.com/apache/qpid-jms/blob/f212e7c2c3e21d44e90068f9c0bf6ab374a68f8e/qpid-jms-client/src/main/java/org/apache/qpid/jms/provider/amqp/AmqpFixedProducer.java#L62
    >
    >
    > I would like to get some feedback on this from the mailing list though.
    >
    > Does anybody have more insight into AmqpTransferTagGenerator? Is the tag 
pool in there really just an optimization?
    > Should this be made configurable or just disabled in general?
    >
    >
    > Also, is anything wrong with our usage of the messaginghub JMS pool?
    >
    >
    > Thanks
    > Timo
    >
    >

    A fix for the client has been submitted that stops the recycling of tags 
    when a send fails which will be in the next client release.

    https://issues.apache.org/jira/browse/QPIDJMS-515


    -- 
    Tim Bish


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
    For additional commands, e-mail: users-h...@qpid.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org

Reply via email to