Anton, Thank you for clarification.
вт, 16 июл. 2019 г. в 09:24, Anton Vinogradov <a...@apache.org>: > > Ivan R. > > Thanks. > I'll try to implement approach you proposed. > > Ivan P. > > >> what prevents primary partition relocation when > >> Read Repair is in progress? Is there a transaction or an exlicit lock? > Did you mean partition eviction? > RR is almost a regular get with the same logic. It maps on some topology > and performs regular gets. > In case node failed or is not an owner anymore we'll just ignore this. > See the code for details: > > if (invalidNodeSet.contains(affNode) || !cctx.discovery().alive(affNode)) { > onDone(Collections.emptyMap()); // Finishing mini future with just "ok". > > On Tue, Jul 16, 2019 at 9:04 AM Павлухин Иван <vololo...@gmail.com> wrote: > > > Anton, > > > > You referenced to failover scenarios. I believe that everything is > > described in IEP. But to make this discussion self-sufficient could > > you please outline what prevents primary partition relocation when > > Read Repair is in progress? Is there a transaction or an exlicit lock? > > > > пн, 15 июл. 2019 г. в 23:49, Ivan Rakov <ivan.glu...@gmail.com>: > > > > > > Anton, > > > > > > > Step-by-step: > > > > 1) primary locked on key mention (get/put) at > > pessimistic/!read-committed tx > > > > 2) backups locked on prepare > > > > 3) primary unlocked on finish > > > > 4) backups unlocked on finish (after the primary) > > > > correct? > > > Yes, this corresponds to my understanding of transactions protocol. With > > > minor exception: steps 3 and 4 are inverted in case of one-phase commit. > > > > > > > Agree, but seems there is no need to acquire the lock, we have just to > > wait > > > > until entry becomes unlocked. > > > > - entry locked means that previous tx's "finish" phase is in progress > > > > - entry unlocked means reading value is up-to-date (previous "finish" > > phase > > > > finished) > > > > correct? > > > Diving deeper, entry is locked if its GridCacheMapEntry.localCandidates > > > queue is not empty (first item in queue is actually the transaction that > > > owns lock). > > > > > > > we have just to wait > > > > until entry becomes unlocked. > > > This may work. > > > If consistency checking code has acquired lock on primary, backup can be > > > in two states: > > > - not locked - and new locks won't appear as we are holding lock on > > primary > > > - still locked by transaction that owned lock on primary just before our > > > checking code - in such case checking code should just wait for lock > > release > > > > > > Best Regards, > > > Ivan Rakov > > > > > > On 15.07.2019 9:34, Anton Vinogradov wrote: > > > > Ivan R. > > > > > > > > Thanks for joining! > > > > > > > > Got an idea, but not sure that got a way of a fix. > > > > > > > > AFAIK (can be wrong, please correct if necessary), at 2PC, locks are > > > > acquired on backups during the "prepare" phase and released at "finish" > > > > phase after primary fully committed. > > > > Step-by-step: > > > > 1) primary locked on key mention (get/put) at > > pessimistic/!read-committed tx > > > > 2) backups locked on prepare > > > > 3) primary unlocked on finish > > > > 4) backups unlocked on finish (after the primary) > > > > correct? > > > > > > > > So, acquiring locks on backups, not at the "prepare" phase, may cause > > > > unexpected behavior in case of primary fail or other errors. > > > > That's definitely possible to update failover to solve this issue, but > > it > > > > seems to be an overcomplicated way. > > > > The main question there, it there any simple way? > > > > > > > >>> checking read from backup will just wait for commit if it's in > > progress. > > > > Agree, but seems there is no need to acquire the lock, we have just to > > wait > > > > until entry becomes unlocked. > > > > - entry locked means that previous tx's "finish" phase is in progress > > > > - entry unlocked means reading value is up-to-date (previous "finish" > > phase > > > > finished) > > > > correct? > > > > > > > > On Mon, Jul 15, 2019 at 8:37 AM Павлухин Иван <vololo...@gmail.com> > > wrote: > > > > > > > >> Anton, > > > >> > > > >> I did not know mechanics locking entries on backups during prepare > > > >> phase. Thank you for pointing that out! > > > >> > > > >> пт, 12 июл. 2019 г. в 22:45, Ivan Rakov <ivan.glu...@gmail.com>: > > > >>> Hi Anton, > > > >>> > > > >>>> Each get method now checks the consistency. > > > >>>> Check means: > > > >>>> 1) tx lock acquired on primary > > > >>>> 2) gained data from each owner (primary and backups) > > > >>>> 3) data compared > > > >>> Did you consider acquiring locks on backups as well during your > > check, > > > >>> just like 2PC prepare does? > > > >>> If there's HB between steps 1 (lock primary) and 2 (update primary + > > > >>> lock backup + update backup), you may be sure that there will be no > > > >>> false-positive results and no deadlocks as well. Protocol won't be > > > >>> complicated: checking read from backup will just wait for commit if > > it's > > > >>> in progress. > > > >>> > > > >>> Best Regards, > > > >>> Ivan Rakov > > > >>> > > > >>> On 12.07.2019 9:47, Anton Vinogradov wrote: > > > >>>> Igniters, > > > >>>> > > > >>>> Let me explain problem in detail. > > > >>>> Read Repair at pessimistic tx (locks acquired on primary, full sync, > > > >> 2pc) > > > >>>> able to see consistency violation because backups are not updated > > yet. > > > >>>> This seems to be not a good idea to "fix" code to unlock primary > > only > > > >> when > > > >>>> backups updated, this definitely will cause a performance drop. > > > >>>> Currently, there is no explicit sync feature allows waiting for > > backups > > > >>>> updated during the previous tx. > > > >>>> Previous tx just sends GridNearTxFinishResponse to the originating > > > >> node. > > > >>>> Bad ideas how to handle this: > > > >>>> - retry some times (still possible to gain false positive) > > > >>>> - lock tx entry on backups (will definitely break failover logic) > > > >>>> - wait for same entry version on backups during some timeout (will > > > >> require > > > >>>> huge changes at "get" logic and false positive still possible) > > > >>>> > > > >>>> Is there any simple fix for this issue? > > > >>>> Thanks for tips in advance. > > > >>>> > > > >>>> Ivan, > > > >>>> thanks for your interest > > > >>>> > > > >>>>>> 4. Very fast and lucky txB writes a value 2 for the key on primary > > > >> and > > > >>>> backup. > > > >>>> AFAIK, reordering not possible since backups "prepared" before > > primary > > > >>>> releases lock. > > > >>>> So, consistency guaranteed by failover and by "prepare" feature of > > 2PC. > > > >>>> Seems, the problem is NOT with consistency at AI, but with > > consistency > > > >>>> detection implementation (RR) and possible "false positive" results. > > > >>>> BTW, checked 1PC case (only one data node at test) and gained no > > > >> issues. > > > >>>> On Fri, Jul 12, 2019 at 9:26 AM Павлухин Иван <vololo...@gmail.com> > > > >> wrote: > > > >>>>> Anton, > > > >>>>> > > > >>>>> Is such behavior observed for 2PC or for 1PC optimization? Does > > not it > > > >>>>> mean that the things can be even worse and an inconsistent write is > > > >>>>> possible on a backup? E.g. in scenario: > > > >>>>> 1. txA writes a value 1 for the key on primary. > > > >>>>> 2. txA unlocks the key on primary. > > > >>>>> 3. txA freezes before updating backup. > > > >>>>> 4. Very fast and lucky txB writes a value 2 for the key on primary > > and > > > >>>>> backup. > > > >>>>> 5. txB wakes up and writes 1 for the key. > > > >>>>> 6. As result there is 2 on primary and 1 on backup. > > > >>>>> > > > >>>>> Naively it seems that locks should be released after all replicas > > are > > > >>>>> updated. > > > >>>>> > > > >>>>> ср, 10 июл. 2019 г. в 16:36, Anton Vinogradov <a...@apache.org>: > > > >>>>>> Folks, > > > >>>>>> > > > >>>>>> Investigating now unexpected repairs [1] in case of ReadRepair > > usage > > > >> at > > > >>>>>> testAccountTxNodeRestart. > > > >>>>>> Updated [2] the test to check is there any repairs happen. > > > >>>>>> Test's name now is "testAccountTxNodeRestartWithReadRepair". > > > >>>>>> > > > >>>>>> Each get method now checks the consistency. > > > >>>>>> Check means: > > > >>>>>> 1) tx lock acquired on primary > > > >>>>>> 2) gained data from each owner (primary and backups) > > > >>>>>> 3) data compared > > > >>>>>> > > > >>>>>> Sometime, backup may have obsolete value during such check. > > > >>>>>> > > > >>>>>> Seems, this happen because tx commit on primary going in the > > > >> following > > > >>>>> way > > > >>>>>> (check code [2] for details): > > > >>>>>> 1) performing localFinish (releases tx lock) > > > >>>>>> 2) performing dhtFinish (commits on backups) > > > >>>>>> 3) transferring control back to the caller > > > >>>>>> > > > >>>>>> So, seems, the problem here is that "tx lock released on primary" > > > >> does > > > >>>>> not > > > >>>>>> mean that backups updated, but "commit() method finished at > > caller's > > > >>>>>> thread" does. > > > >>>>>> This means that, currently, there is no happens-before between > > > >>>>>> 1) thread 1 committed data on primary and tx lock can be > > reobtained > > > >>>>>> 2) thread 2 reads from backup > > > >>>>>> but still strong HB between "commit() finished" and "backup > > updated" > > > >>>>>> > > > >>>>>> So, it seems to be possible, for example, to gain notification by > > a > > > >>>>>> continuous query, then read from backup and gain obsolete value. > > > >>>>>> > > > >>>>>> Is this "partial happens before" behavior expected? > > > >>>>>> > > > >>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-11973 > > > >>>>>> [2] https://github.com/apache/ignite/pull/6679/files > > > >>>>>> [3] > > > >>>>>> > > > >> > > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal#finishTx > > > >>>>> > > > >>>>> > > > >>>>> -- > > > >>>>> Best regards, > > > >>>>> Ivan Pavlukhin > > > >>>>> > > > >> > > > >> > > > >> -- > > > >> Best regards, > > > >> Ivan Pavlukhin > > > >> > > > > > > > > -- > > Best regards, > > Ivan Pavlukhin > > -- Best regards, Ivan Pavlukhin