Folks, Investigating now unexpected repairs [1] in case of ReadRepair usage at testAccountTxNodeRestart. Updated [2] the test to check is there any repairs happen. Test's name now is "testAccountTxNodeRestartWithReadRepair".
Each get method now checks the consistency. Check means: 1) tx lock acquired on primary 2) gained data from each owner (primary and backups) 3) data compared Sometime, backup may have obsolete value during such check. Seems, this happen because tx commit on primary going in the following way (check code [2] for details): 1) performing localFinish (releases tx lock) 2) performing dhtFinish (commits on backups) 3) transferring control back to the caller So, seems, the problem here is that "tx lock released on primary" does not mean that backups updated, but "commit() method finished at caller's thread" does. This means that, currently, there is no happens-before between 1) thread 1 committed data on primary and tx lock can be reobtained 2) thread 2 reads from backup but still strong HB between "commit() finished" and "backup updated" So, it seems to be possible, for example, to gain notification by a continuous query, then read from backup and gain obsolete value. Is this "partial happens before" behavior expected? [1] https://issues.apache.org/jira/browse/IGNITE-11973 [2] https://github.com/apache/ignite/pull/6679/files [3] org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx