Anton,
Step-by-step:
1) primary locked on key mention (get/put) at pessimistic/!read-committed tx
2) backups locked on prepare
3) primary unlocked on finish
4) backups unlocked on finish (after the primary)
correct?
Yes, this corresponds to my understanding of transactions protocol. With
minor exception: steps 3 and 4 are inverted in case of one-phase commit.
Agree, but seems there is no need to acquire the lock, we have just to wait
until entry becomes unlocked.
- entry locked means that previous tx's "finish" phase is in progress
- entry unlocked means reading value is up-to-date (previous "finish" phase
finished)
correct?
Diving deeper, entry is locked if its GridCacheMapEntry.localCandidates
queue is not empty (first item in queue is actually the transaction that
owns lock).
we have just to wait
until entry becomes unlocked.
This may work.
If consistency checking code has acquired lock on primary, backup can be
in two states:
- not locked - and new locks won't appear as we are holding lock on primary
- still locked by transaction that owned lock on primary just before our
checking code - in such case checking code should just wait for lock release
Best Regards,
Ivan Rakov
On 15.07.2019 9:34, Anton Vinogradov wrote:
Ivan R.
Thanks for joining!
Got an idea, but not sure that got a way of a fix.
AFAIK (can be wrong, please correct if necessary), at 2PC, locks are
acquired on backups during the "prepare" phase and released at "finish"
phase after primary fully committed.
Step-by-step:
1) primary locked on key mention (get/put) at pessimistic/!read-committed tx
2) backups locked on prepare
3) primary unlocked on finish
4) backups unlocked on finish (after the primary)
correct?
So, acquiring locks on backups, not at the "prepare" phase, may cause
unexpected behavior in case of primary fail or other errors.
That's definitely possible to update failover to solve this issue, but it
seems to be an overcomplicated way.
The main question there, it there any simple way?
checking read from backup will just wait for commit if it's in progress.
Agree, but seems there is no need to acquire the lock, we have just to wait
until entry becomes unlocked.
- entry locked means that previous tx's "finish" phase is in progress
- entry unlocked means reading value is up-to-date (previous "finish" phase
finished)
correct?
On Mon, Jul 15, 2019 at 8:37 AM Павлухин Иван <vololo...@gmail.com> wrote:
Anton,
I did not know mechanics locking entries on backups during prepare
phase. Thank you for pointing that out!
пт, 12 июл. 2019 г. в 22:45, Ivan Rakov <ivan.glu...@gmail.com>:
Hi Anton,
Each get method now checks the consistency.
Check means:
1) tx lock acquired on primary
2) gained data from each owner (primary and backups)
3) data compared
Did you consider acquiring locks on backups as well during your check,
just like 2PC prepare does?
If there's HB between steps 1 (lock primary) and 2 (update primary +
lock backup + update backup), you may be sure that there will be no
false-positive results and no deadlocks as well. Protocol won't be
complicated: checking read from backup will just wait for commit if it's
in progress.
Best Regards,
Ivan Rakov
On 12.07.2019 9:47, Anton Vinogradov wrote:
Igniters,
Let me explain problem in detail.
Read Repair at pessimistic tx (locks acquired on primary, full sync,
2pc)
able to see consistency violation because backups are not updated yet.
This seems to be not a good idea to "fix" code to unlock primary only
when
backups updated, this definitely will cause a performance drop.
Currently, there is no explicit sync feature allows waiting for backups
updated during the previous tx.
Previous tx just sends GridNearTxFinishResponse to the originating
node.
Bad ideas how to handle this:
- retry some times (still possible to gain false positive)
- lock tx entry on backups (will definitely break failover logic)
- wait for same entry version on backups during some timeout (will
require
huge changes at "get" logic and false positive still possible)
Is there any simple fix for this issue?
Thanks for tips in advance.
Ivan,
thanks for your interest
4. Very fast and lucky txB writes a value 2 for the key on primary
and
backup.
AFAIK, reordering not possible since backups "prepared" before primary
releases lock.
So, consistency guaranteed by failover and by "prepare" feature of 2PC.
Seems, the problem is NOT with consistency at AI, but with consistency
detection implementation (RR) and possible "false positive" results.
BTW, checked 1PC case (only one data node at test) and gained no
issues.
On Fri, Jul 12, 2019 at 9:26 AM Павлухин Иван <vololo...@gmail.com>
wrote:
Anton,
Is such behavior observed for 2PC or for 1PC optimization? Does not it
mean that the things can be even worse and an inconsistent write is
possible on a backup? E.g. in scenario:
1. txA writes a value 1 for the key on primary.
2. txA unlocks the key on primary.
3. txA freezes before updating backup.
4. Very fast and lucky txB writes a value 2 for the key on primary and
backup.
5. txB wakes up and writes 1 for the key.
6. As result there is 2 on primary and 1 on backup.
Naively it seems that locks should be released after all replicas are
updated.
ср, 10 июл. 2019 г. в 16:36, Anton Vinogradov <a...@apache.org>:
Folks,
Investigating now unexpected repairs [1] in case of ReadRepair usage
at
testAccountTxNodeRestart.
Updated [2] the test to check is there any repairs happen.
Test's name now is "testAccountTxNodeRestartWithReadRepair".
Each get method now checks the consistency.
Check means:
1) tx lock acquired on primary
2) gained data from each owner (primary and backups)
3) data compared
Sometime, backup may have obsolete value during such check.
Seems, this happen because tx commit on primary going in the
following
way
(check code [2] for details):
1) performing localFinish (releases tx lock)
2) performing dhtFinish (commits on backups)
3) transferring control back to the caller
So, seems, the problem here is that "tx lock released on primary"
does
not
mean that backups updated, but "commit() method finished at caller's
thread" does.
This means that, currently, there is no happens-before between
1) thread 1 committed data on primary and tx lock can be reobtained
2) thread 2 reads from backup
but still strong HB between "commit() finished" and "backup updated"
So, it seems to be possible, for example, to gain notification by a
continuous query, then read from backup and gain obsolete value.
Is this "partial happens before" behavior expected?
[1] https://issues.apache.org/jira/browse/IGNITE-11973
[2] https://github.com/apache/ignite/pull/6679/files
[3]
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx
--
Best regards,
Ivan Pavlukhin
--
Best regards,
Ivan Pavlukhin