Hello Felipe,

Thank you very much for your detailed bug report and investigation. Reports
like this help us improve Ignite and make it more reliable. Please feel
free to share any other issues you encounter - your feedback is very
valuable for the community.

We have reviewed your findings and the code references you provided. We
agree that this looks like a real liveness issue in IgniteLock. We have
created a JIRA ticket to track it:
https://issues.apache.org/jira/browse/IGNITE-27962

We will continue investigating the root cause and possible fixes.

If possible, could you also share debug logs around the topology change and
lock acquisition (from both the client and the server node that was
stopped)? In particular, logs covering:

   - the transaction commit,
   - continuous query processing,
   - node left / topology change events.

This may help us better understand the race condition and validate a fix.

As a temporary workaround, you may try using IgniteSemaphore(1,
failoverSafe=true) instead of IgniteLock, if reentrancy is not required in
your use case.

In addition, we are currently working on improved Rolling Upgrade
functionality (IEP-132):
https://cwiki.apache.org/confluence/display/IGNITE/IEP-132+Rolling+Upgrade
This feature is under active development, and we plan to finalize it in
upcoming releases. We expect it to improve stability and behavior during
node restarts and cluster upgrades.

Thank you again for your contribution and detailed analysis.


-- 
Best regards,
Aleksandr Chesnokov

Reply via email to