[
https://issues.apache.org/jira/browse/IGNITE-26020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18047302#comment-18047302
]
Denis Chudov commented on IGNITE-26020:
---------------------------------------
some thoughts:
{code:java}
ro:
has read ts
txn that created WI may be pending and remain pending after WI resolution
rw:
final tx state needed, because there is a need of WI resolution because:
- WI is not under lock
- primary changed and we cannot commit this transaction (and there was no
graceful primary switch because WI is not under lock)
so, if rw txn performs WI resolution then the txn is either already finished
or should be aborted (fixed in
https://issues.apache.org/jira/browse/IGNITE-27255 )
if commit partition has volatile non-final state, and there is no coordinator,
commit partition aborts the txn.
if commit partition doesnt have neither volatile nor persistent state this
means one of two:
- txn is not finished, volatile state is lost
- txn was finished, state was vacuumized
both mean primary resolution path.
assuming that the resolution request comes to commit partition from node N to
commit partition leaseholder C,
before doing request on primary, C gets current primary P and compares its
consistency token with the consistency token from request from node N.
- if request consistency token is null, the request is not from primary
replica, proceed,
- if request consistency token is the same as current primary consistency
token (probably the primary was moved to N after it did the request or
whatever)
if request has read timestamp (initiated by RO txn) - we can proceed with
doing request to N (consider it is rare enough to do micro optimizations)
if request was initiated by RW txn - this means N is current primary, it
has the most recent state of the row, and there is WI so it was not cleaned up
on group majority - this means the txn was never finished and can be aborted
- if request consistency token is NOT equal to current primary consistency
token:
if request has read timestamp (initiated by RO txn) - proceed
if request was initiated by RW txn - respond with error, N must abort its
operation and respond with error to client (primary changed, the current
transaction will not be able to be committed)
cases after C sends request to P:
case 0 (WI is finished on primary):
- tx2 reads WI on node N created by tx0, node N is backup of group G with
primary replica P
- state of tx0 (including persistent state) is absent everywhere
- node N tries to resolve the state of tx0 from commit partition leaseholder C
- C doesn't have state of tx0 and makes request to P
- on P, the state of WI is finished
- B doesn't write persistent state (because it was written before), cashes
volatile state, responds with COMMITTED/ABORTED to N
case 1 (WI is NOT finished on primary, 4 first steps are same as case 0):
- tx2 reads WI on node N created by tx0, node N is backup of group G with
primary replica P
- state of tx0 (including persistent state) is absent everywhere
- node N tries to resolve the state of tx0 from commit partition leaseholder C
- C doesn't have state of tx0 and makes request to P
- on P, the state of WI is NOT finished, this means the volatile state was lost
on commit partition, and persistent state was never written, because it can be
vacuumized only after WI is finished on majority (and primary) of G
- C finishes tx0, responds to N with ABORTED state (regular recovery)
case 2 (WI0 is finished but already WI1 is written on top)
- tx2 reads WI0 on node N created by tx0, node N is backup of group G with
primary replica P
- state of tx0 (including persistent state) is absent everywhere
- node N tries to resolve the state of tx0 from commit partition leaseholder C
- C doesn't have state of tx0 and makes request to P (with read timestamp, if
tx2 is RO)
- on P, not only WI0 is finalized, but the storage has completely different
state. Further actions:
if request should have read timestamp because tx2 is RO:
P selects version corresponding this timestamp,
if later record is present and matches the read timestamp, then responds
with this later record,
if WI0 is committed and matches the read timestamp, then responds to C
with COMMITTED state,
if absent (aborted) but earlier record is present and matches the read
timestamp, then responds with ABORTED state,
if absent (aborted) but another WI1 is written, then advances its hybrid
time after read timestamp and responds with ABORTED state (if WI1 will be
finalized, it will have commit timestamp greater then given read timestamp)
if absent (aborted) and no records correspond timestamp, then checks read
ts:
if read ts is greater than now - dataAvailabilityTimeout, responds with
ABORTED state,
if read ts is less than or equal to now - dataAvailabilityTimeout,
respond with error (outdated ro txn), this means the version was collected by
GC and state is unknown
the case when request does not have read timestamp (meaning that tx2 is RW
txn) is not possible.
- C gets from P either tx state or more recent row, responds to N (resolution
response should be extended with row, its commit timestamp, etc)+ check that
write intents switch on primary replica (i.e. they must be switched before
cleanup is considered as completed) {code}
> Invsetigate what is needed for write intent handling after making pending
> rows persistent
> -----------------------------------------------------------------------------------------
>
> Key: IGNITE-26020
> URL: https://issues.apache.org/jira/browse/IGNITE-26020
> Project: Ignite
> Issue Type: Task
> Components: rw transactions ai3, transactions ai3
> Reporter: Vladimir Pligin
> Assignee: Denis Chudov
> Priority: Major
> Labels: ignite-3
>
> *Summary*
> After the implementation of
> https://issues.apache.org/jira/browse/IGNITE-25665 there are going to be a
> few potential changes in this epic. The goal is understand guarantees that
> are provided after pending rows are persistent and refine the tickets in this
> epic to reflect the needed scope.
> *DoD*
> Tickets in this epic are actualized.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)