[
https://issues.apache.org/jira/browse/ARTEMIS-5806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050246#comment-18050246
]
Justin Bertram commented on ARTEMIS-5806:
-----------------------------------------
I believe I've reproduced this passing
{{javax.transaction.xa.XAResource#TMNOFLAGS}} to {{start}} and
{{javax.transaction.xa.XAResource#TMSUCCESS}} to {{end}} at the various
invocations.
>From what I can tell, in "Test 2" when the "JTA Recovery" thread invokes
>{{rollback(xid1)}} and the broker is unable to find {{xid1}} (e.g. because it
>timed out) then it actually rolls back {{xid2}} (i.e. the currently associated
>transaction) due to the code you noted (i.e.
>[ServerSessionImpl.java#L1627|https://github.com/apache/artemis/blob/fa1da6e6301fd89f7ec6dcdb98fd4366597082fa/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ServerSessionImpl.java#L1627]).
> This rollback then disassociates the JMS {{Session}} from any transaction so
>that when the "MDB Poller" thread receives the message (and ostensibly
>acknowledges it) it is then removed from the broker during the call to
>{{end(xid2)}} because the {{end}} method flushes all acks to the broker.
Removing the aforementioned code in
[ServerSessionImpl.java|https://github.com/apache/artemis/blob/fa1da6e6301fd89f7ec6dcdb98fd4366597082fa/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ServerSessionImpl.java#L1627]
fixes the issue, but that might break another use-case. I'm still
investigating to determine if this is a safe fix.
> Message loss due to XA session rollback after broker restart
> ------------------------------------------------------------
>
> Key: ARTEMIS-5806
> URL: https://issues.apache.org/jira/browse/ARTEMIS-5806
> Project: Artemis
> Issue Type: Bug
> Components: Broker
> Affects Versions: 2.40.0, 2.44.0
> Reporter: Marc Leisi
> Priority: Critical
> Attachments: MessageLossAfterRestart.png
>
>
> In our setup, an MDB deployed in an Oracle WebLogic container connects to an
> ActiveMQ Artemis broker using XA transactions. To receive messages, the
> WebLogic MDB framework repeatedly polls by opening an XA transaction
> ({{{}xaStart{}}}), performing a {{{}receive(timeout){}}}, and then closing
> the transaction ({{{}xaEnd{}}}). If a message was received, the transaction
> is prepared and committed, otherwise rollbacked.
> During a graceful broker shudown, all active transactions and sessions on the
> broker are closed. That part works as expected. However, after the restart we
> encounter a problematic behavior:
> The MDB begins polling again ({{{}xaStart{}}} + {{{}receive(timeout){}}}).
> Before the receive() timeouts, in parallel the WebLogic JTA framework tries
> to finish the open transaction (started before the shutdown). This is done in
> the same session as the MDB polling. Since that transaction no longer exists
> on the broker, {{xaEnd}} fails with {_}"Cannot find suspended transaction to
> end"{_}. WebLogic JTA forces a {{{}xaRollback{}}}, which also fails with
> {_}"Cannot find xid in resource manager"{_}. On the broker side, the session
> is rollbacked (see:
> [ServerSessionImpl.java#L1627|https://github.com/apache/artemis/blob/fa1da6e6301fd89f7ec6dcdb98fd4366597082fa/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ServerSessionImpl.java#L1627]).
> The session rollback will cancel all open transactions in the session,
> including the ongoing MDB polling transaction.
> The real problem occurs afterwards, if a new message is produced and ready to
> be delivered to the MDB poller ( receive(timeout)). Artemis delivers the
> message, the MDB poller tries to end ({{{}xaEnd{}}}) the transaction. Because
> the transaction was already removed during the previous session rollback,
> this results in {_}"Cannot find suspended transaction to end"{_}. The MDB
> poller will force a global rollback, it drops the message and attempts to
> roll back on Artemis broker, which also fails ({_}"Cannot find xid in
> resource manager"{_}). As a result, on the Artemis broker the message is
> lost: it is removed from the queue, and there are no open prepared
> transaction for it anymore.
> Here is a short version of the flow (A simple sequence diagram is attached as
> well):
> {code:java}
> xaStart(xid1) (session1)
> receive(timeout)
> — broker restart —
> xaStart(xid2) (session2)
> receive(timeout)
> xaEnd(xid1) (session2)
> — Cannot find suspended transaction to end
> xaRollback(xid1) (session2)
> — Cannot find xid in resource manager--- removes remove xid1 & all xids in
> session
> (including xid2)
> message — receive with xid2
> xaEnd(xid2)
> — Cannot find suspended transaction to end
> xaRollback(xid2)
> — Cannot find xid in resource manager
> message dropped due to exception, message no longer on queue and no
> transaction on artemis left
> {code}
> To reproduce this behavior, I adapted the XA receive example in a fork:
> [https://github.com/leisma/activemq-artemis-examples/commit/61deb9832eefeda360ff3207b3ad8e56c4ea2aa6\|https://github.com/leisma/activemq-artemis-examples/commit/61deb9832eefeda360ff3207b3ad8e56c4ea2aa6%5C]
> (You need to run a broker separately to execute it)
> I’m not sure whether Artemis implicitly assumes that only one XA transaction
> may exist per session. I could not find clear guidance in the JTA
> specification or other documentation regarding how XA transactions should
> behave in this scenario.
> Is this the expected behavior?
> Or would it be possible for Artemis to check whether a session still contains
> active transactions before performing a rollback, which would prevent the
> message loss we are seeing?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]