Hello Igniters,

Current implementation of
GridDhtPartitionsExchangeFuture#waitPartitionRelease function doesn't give
us 100% guarantees that
after this method completes there are no ongoing atomic or transactional
updates on current node during main stage of PME.
It gives us only guarantee that all primary updates will be finished on
that node, while we can still receive and process backup updates after this
method.
Example of such case is described in
https://issues.apache.org/jira/browse/IGNITE-7871

To avoid such situations we would like to implement second phase of
waitPartitionRelease method.
On this phase every server node participating in PME should wait while all
other server nodes will finish their ongoing updates.

Here is brief algorithm description:

Non-coordinator node:
1) Finish all ongoing atomic & transactional updates.
2) Send acknowledgement to coordinator.
3) Wait for final acknowledgement from coordinator, that all nodes finished
their updates.
4) Continue PME.

Coordinator node:
1) Finish all ongoing atomic & transactional updates.
2) Wait for all acknowledgements from all server nodes.
3) Send final acknowledgement to all server nodes.
4) Continue PME.

Acknowledgement messages have tiny size, so network pressure and overall
performance drop will be minimal.

Another solution of the problem is just cancelling atomic backup updates
and transactional backup updates on PREPARED phase if topology version is
changed.
But from user perspective it's not correct to catch transaction errors even
in cases when node is joining to the cluster.

Any thoughts?

Reply via email to