[ 
https://issues.apache.org/jira/browse/ARTEMIS-5806?focusedWorklogId=998973&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-998973
 ]

ASF GitHub Bot logged work on ARTEMIS-5806:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/Jan/26 19:18
            Start Date: 07/Jan/26 19:18
    Worklog Time Spent: 10m 
      Work Description: tabish121 commented on PR #6168:
URL: https://github.com/apache/artemis/pull/6168#issuecomment-3720393527

   LGTM




Issue Time Tracking
-------------------

    Worklog Id:     (was: 998973)
    Time Spent: 0.5h  (was: 20m)

> Message loss due to XA session rollback after broker restart
> ------------------------------------------------------------
>
>                 Key: ARTEMIS-5806
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-5806
>             Project: Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.40.0, 2.44.0
>            Reporter: Marc Leisi
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: MessageLossAfterRestart.png
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In our setup, an MDB deployed in an Oracle WebLogic container connects to an 
> ActiveMQ Artemis broker using XA transactions. To receive messages, the 
> WebLogic MDB framework repeatedly polls by opening an XA transaction 
> ({{{}xaStart{}}}), performing a {{{}receive(timeout){}}}, and then closing 
> the transaction ({{{}xaEnd{}}}). If a message was received, the transaction 
> is prepared and committed, otherwise rollbacked.
> During a graceful broker shudown, all active transactions and sessions on the 
> broker are closed. That part works as expected. However, after the restart we 
> encounter a problematic behavior:
> The MDB begins polling again ({{{}xaStart{}}} + {{{}receive(timeout){}}}). 
> Before the receive() timeouts, in parallel the WebLogic JTA framework tries 
> to finish the open transaction (started before the shutdown). This is done in 
> the same session as the MDB polling. Since that transaction no longer exists 
> on the broker, {{xaEnd}} fails with {_}"Cannot find suspended transaction to 
> end"{_}. WebLogic JTA forces a {{{}xaRollback{}}}, which also fails with 
> {_}"Cannot find xid in resource manager"{_}. On the broker side, the session 
> is rollbacked (see: 
> [ServerSessionImpl.java#L1627|https://github.com/apache/artemis/blob/fa1da6e6301fd89f7ec6dcdb98fd4366597082fa/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/ServerSessionImpl.java#L1627]).
>  The session rollback will cancel all open transactions in the session, 
> including the ongoing MDB polling transaction.
> The real problem occurs afterwards, if a new message is produced and ready to 
> be delivered to the MDB poller ( receive(timeout)). Artemis delivers the 
> message, the MDB poller tries to end ({{{}xaEnd{}}}) the transaction. Because 
> the transaction was already removed during the previous session rollback, 
> this results in {_}"Cannot find suspended transaction to end"{_}. The MDB 
> poller will force a global rollback, it drops the message and attempts to 
> roll back on Artemis broker, which also fails ({_}"Cannot find xid in 
> resource manager"{_}). As a result, on the Artemis broker the message is 
> lost: it is removed from the queue, and there are no open prepared 
> transaction for it anymore.
> Here is a short version of the flow (A simple sequence diagram is attached as 
> well):
> {code:java}
> xaStart(xid1) (session1)
> receive(timeout) 
> — broker restart —
> xaStart(xid2) (session2)
> receive(timeout) 
> xaEnd(xid1) (session2)
> — Cannot find suspended transaction to end
> xaRollback(xid1) (session2)
> — Cannot find xid in resource manager--- removes remove xid1 & all xids in 
> session
> (including xid2)
> message — receive with xid2 
> xaEnd(xid2)
> — Cannot find suspended transaction to end
> xaRollback(xid2)
> — Cannot find xid in resource manager
> message dropped due to exception, message no longer on queue and no 
> transaction on artemis left
> {code}
> To reproduce this behavior, I adapted the XA receive example in a fork:
> [https://github.com/leisma/activemq-artemis-examples/commit/61deb9832eefeda360ff3207b3ad8e56c4ea2aa6\|https://github.com/leisma/activemq-artemis-examples/commit/61deb9832eefeda360ff3207b3ad8e56c4ea2aa6%5C]
>  (You need to run a broker separately to execute it)
> I’m not sure whether Artemis implicitly assumes that only one XA transaction 
> may exist per session. I could not find clear guidance in the JTA 
> specification or other documentation regarding how XA transactions should 
> behave in this scenario.
> Is this the expected behavior?
> Or would it be possible for Artemis to check whether a session still contains 
> active transactions before performing a rollback, which would prevent the 
> message loss we are seeing?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to