Hi Anton,

Each get method now checks the consistency.
Check means:
1) tx lock acquired on primary
2) gained data from each owner (primary and backups)
3) data compared
Did you consider acquiring locks on backups as well during your check, just like 2PC prepare does? If there's HB between steps 1 (lock primary) and 2 (update primary + lock backup + update backup), you may be sure that there will be no false-positive results and no deadlocks as well. Protocol won't be complicated: checking read from backup will just wait for commit if it's in progress.

Best Regards,
Ivan Rakov

On 12.07.2019 9:47, Anton Vinogradov wrote:
Igniters,

Let me explain problem in detail.
Read Repair at pessimistic tx (locks acquired on primary, full sync, 2pc)
able to see consistency violation because backups are not updated yet.
This seems to be not a good idea to "fix" code to unlock primary only when
backups updated, this definitely will cause a performance drop.
Currently, there is no explicit sync feature allows waiting for backups
updated during the previous tx.
Previous tx just sends GridNearTxFinishResponse to the originating node.

Bad ideas how to handle this:
- retry some times (still possible to gain false positive)
- lock tx entry on backups (will definitely break failover logic)
- wait for same entry version on backups during some timeout (will require
huge changes at "get" logic and false positive still possible)

Is there any simple fix for this issue?
Thanks for tips in advance.

Ivan,
thanks for your interest

4. Very fast and lucky txB writes a value 2 for the key on primary and
backup.
AFAIK, reordering not possible since backups "prepared" before primary
releases lock.
So, consistency guaranteed by failover and by "prepare" feature of 2PC.
Seems, the problem is NOT with consistency at AI, but with consistency
detection implementation (RR) and possible "false positive" results.
BTW, checked 1PC case (only one data node at test) and gained no issues.

On Fri, Jul 12, 2019 at 9:26 AM Павлухин Иван <vololo...@gmail.com> wrote:

Anton,

Is such behavior observed for 2PC or for 1PC optimization? Does not it
mean that the things can be even worse and an inconsistent write is
possible on a backup? E.g. in scenario:
1. txA writes a value 1 for the key on primary.
2. txA unlocks the key on primary.
3. txA freezes before updating backup.
4. Very fast and lucky txB writes a value 2 for the key on primary and
backup.
5. txB wakes up and writes 1 for the key.
6. As result there is 2 on primary and 1 on backup.

Naively it seems that locks should be released after all replicas are
updated.

ср, 10 июл. 2019 г. в 16:36, Anton Vinogradov <a...@apache.org>:
Folks,

Investigating now unexpected repairs [1] in case of ReadRepair usage at
testAccountTxNodeRestart.
Updated [2] the test to check is there any repairs happen.
Test's name now is "testAccountTxNodeRestartWithReadRepair".

Each get method now checks the consistency.
Check means:
1) tx lock acquired on primary
2) gained data from each owner (primary and backups)
3) data compared

Sometime, backup may have obsolete value during such check.

Seems, this happen because tx commit on primary going in the following
way
(check code [2] for details):
1) performing localFinish (releases tx lock)
2) performing dhtFinish (commits on backups)
3) transferring control back to the caller

So, seems, the problem here is that "tx lock released on primary" does
not
mean that backups updated, but "commit() method finished at caller's
thread" does.
This means that, currently, there is no happens-before between
1) thread 1 committed data on primary and tx lock can be reobtained
2) thread 2 reads from backup
but still strong HB between "commit() finished" and "backup updated"

So, it seems to be possible, for example, to gain notification by a
continuous query, then read from backup and gain obsolete value.

Is this "partial happens before" behavior expected?

[1] https://issues.apache.org/jira/browse/IGNITE-11973
[2] https://github.com/apache/ignite/pull/6679/files
[3]

org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx



--
Best regards,
Ivan Pavlukhin

Reply via email to