[
https://issues.apache.org/jira/browse/ARTEMIS-5806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18046800#comment-18046800
]
Marc Leisi commented on ARTEMIS-5806:
-------------------------------------
We did further investigations and reviewed the WebLogic documentation.
To integrate an external Resource Manager in WebLogic, the WebLogic
documentation states that the XAResource must be able to support multiple
transactions in parallel: Each XAResource instance that you register is used
for recovery and commit processing of multiple transactions in parallel. Ensure
that the XAResource instance supports resource sharing as defined in JTA
Specification Version 1.0.1B Section 3.4.6
[https://docs.oracle.com/en/middleware/standalone/weblogic-server/14.1.1.0/wljta/jtatxexp.html#GUID-…|https://docs.oracle.com/en/middleware/standalone/weblogic-server/14.1.1.0/wljta/jtatxexp.html#GUID-%E2%80%A6]
With ActiveMQ Artemis, handling multiple transactions on a single XAResource
seems to work as expected as long as both transactions exist. I verified this
using an example where I reconstructed the behavior described in the
specification at
https://jakarta.ee/specifications/transactions/2.0/jakarta-transactions-spec-2.0.html#resource-shar…
{code:java}
### Test 1: Parallel successful, Xid is present
Thread 1: MDB Poller
xa1.start(xid1)
consumer1.receive(1000)
=> received no message
xa1.end(xid1)
Thread 2: MDB Poller
xa1.start(xid2)
consumer2.receive(10000)
Thread 3: JTA Recovery
xa1.rollback(xid1)
Message Producer:
produce message1
Thread 2: MDB Poller
=> received message1
xa1.end(xid2)
xa1.prepare(xid2)
xa1.commit(xid2){code}
In the second test, a rollback is performed for another transaction while the
resource is currently associated with a different global transaction, using the
same XAResource. However, if the transaction no longer exists, for example due
to a transaction timeout or a shutdown, the message is lost.
{code:java}
### Test 2: Parallel failure, xid doesn't exists
Thread 1: MDB Poller
xa1.start(xid1)
consumer1.receive(1000)
wait for XA Timeout or perform broker shutdown
=> xid1 is removed from broker
Thread 2: MDB Poller
xa1.start(xid2)
consumer2.receive(10000)
Thread 3: JTA Recovery
xa1.rollback(xid1)
=> failure, can't find Xid
Message Producer:
produce message1
Thread 2: MDB Poller
consumer1.receive(10000)
=> received message1 xa1.end(xid2)
=> failure
xa1.rollback(xid2)
=> failure
Message is gone from broker {code}
The fact that Artemis correctly handles multiple transactions on a single
XAResource as long as both transactions exist leads me to the question of
whether it would be possible to improve the section where the session rollback
( [ServerSessionImpl.java#L1627)
|https://github.com/apache/artemis/blob/fa1da6e6301fd89f7ec6dcdb98fd4366597082fa/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ServerSessionImpl.java#L1627]is
performed. Adding a check to determine whether a transaction is actually
present, and only rolling back the entire session in that case, could
potentially resolve this issue.
> Message loss due to XA session rollback after broker restart
> ------------------------------------------------------------
>
> Key: ARTEMIS-5806
> URL: https://issues.apache.org/jira/browse/ARTEMIS-5806
> Project: Artemis
> Issue Type: Bug
> Components: Broker
> Affects Versions: 2.40.0, 2.44.0
> Reporter: Marc Leisi
> Priority: Critical
> Attachments: MessageLossAfterRestart.png
>
>
> In our setup, an MDB deployed in an Oracle WebLogic container connects to an
> ActiveMQ Artemis broker using XA transactions. To receive messages, the
> WebLogic MDB framework repeatedly polls by opening an XA transaction
> ({{{}xaStart{}}}), performing a {{{}receive(timeout){}}}, and then closing
> the transaction ({{{}xaEnd{}}}). If a message was received, the transaction
> is prepared and committed, otherwise rollbacked.
> During a graceful broker shudown, all active transactions and sessions on the
> broker are closed. That part works as expected. However, after the restart we
> encounter a problematic behavior:
> The MDB begins polling again ({{{}xaStart{}}} + {{{}receive(timeout){}}}).
> Before the receive() timeouts, in parallel the WebLogic JTA framework tries
> to finish the open transaction (started before the shutdown). This is done in
> the same session as the MDB polling. Since that transaction no longer exists
> on the broker, {{xaEnd}} fails with {_}"Cannot find suspended transaction to
> end"{_}. WebLogic JTA forces a {{{}xaRollback{}}}, which also fails with
> {_}"Cannot find xid in resource manager"{_}. On the broker side, the session
> is rollbacked (see:
> [ServerSessionImpl.java#L1627|https://github.com/apache/artemis/blob/fa1da6e6301fd89f7ec6dcdb98fd4366597082fa/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ServerSessionImpl.java#L1627]).
> The session rollback will cancel all open transactions in the session,
> including the ongoing MDB polling transaction.
> The real problem occurs afterwards, if a new message is produced and ready to
> be delivered to the MDB poller ( receive(timeout)). Artemis delivers the
> message, the MDB poller tries to end ({{{}xaEnd{}}}) the transaction. Because
> the transaction was already removed during the previous session rollback,
> this results in {_}"Cannot find suspended transaction to end"{_}. The MDB
> poller will force a global rollback, it drops the message and attempts to
> roll back on Artemis broker, which also fails ({_}"Cannot find xid in
> resource manager"{_}). As a result, on the Artemis broker the message is
> lost: it is removed from the queue, and there are no open prepared
> transaction for it anymore.
> Here is a short version of the flow (A simple sequence diagram is attached as
> well):
> {code:java}
> xaStart(xid1) (session1)
> receive(timeout)
> — broker restart —
> xaStart(xid2) (session2)
> receive(timeout)
> xaEnd(xid1) (session2)
> — Cannot find suspended transaction to end
> xaRollback(xid1) (session2)
> — Cannot find xid in resource manager--- removes remove xid1 & all xids in
> session
> (including xid2)
> message — receive with xid2
> xaEnd(xid2)
> — Cannot find suspended transaction to end
> xaRollback(xid2)
> — Cannot find xid in resource manager
> message dropped due to exception, message no longer on queue and no
> transaction on artemis left
> {code}
> To reproduce this behavior, I adapted the XA receive example in a fork:
> [https://github.com/leisma/activemq-artemis-examples/commit/61deb9832eefeda360ff3207b3ad8e56c4ea2aa6\|https://github.com/leisma/activemq-artemis-examples/commit/61deb9832eefeda360ff3207b3ad8e56c4ea2aa6%5C]
> (You need to run a broker separately to execute it)
> I’m not sure whether Artemis implicitly assumes that only one XA transaction
> may exist per session. I could not find clear guidance in the JTA
> specification or other documentation regarding how XA transactions should
> behave in this scenario.
> Is this the expected behavior?
> Or would it be possible for Artemis to check whether a session still contains
> active transactions before performing a rollback, which would prevent the
> message loss we are seeing?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]