[ 
https://issues.apache.org/jira/browse/IGNITE-17578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Lapin updated IGNITE-17578:
-------------------------------------
    Description: 
h3. Motivation

Within RW transaction commit process, according to the tx design it's required 
to return the control to the outer logic right after
 # COMMITED/ABORTED txn state replication
 # Locks release.

Follow-up cleanup process, that will apply or remove write intents, should be 
asynchronous. Currently, it's not true.  It also worth to mention that 
currently, locks are released after write intent application. That should be 
inverted. Such enhancements mean that not only RO but also RW transactions may 
retrieve writeIntent and thus perform writeInentResolution - it's covered with 
separate ticket https://issues.apache.org/jira/browse/IGNITE-19570 that should 
be implemented prior to this one. 
h3. Definition of Done
 * Write intent application or removal should be implemented in an async format.
 * Write intent applicatoin and locks release process should be

h3. Implementation Notes

Generally it's only required to change some code in 
org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener#processTxCleanupAction

 
{code:java}
    return allOffFuturesExceptionIgnored(txUpdateFutures, 
request).thenCompose(v -> {
        TxCleanupCommand txCleanupCmd = MSG_FACTORY.txCleanupCommand()
                .txId(request.txId())
                .commit(request.commit())
                .commitTimestampLong(request.commitTimestampLong())
                .safeTimeLong(hybridClock.nowLong())
                .build();

        return raftClient
                .run(txCleanupCmd)
                .thenCompose(ignored -> 
allOffFuturesExceptionIgnored(txReadFutures, request)
                        .thenRun(() -> releaseTxLocks(request.txId())));
    });
} {code}
 
 * releaseTxLocks priour to sending txCleanupCmd.
 * send txCleanupCmd in async manner, meaning that processTxCleanupAction 
should return the result after locks are released.

It seems that durable writeIntentApplication, the process that is triggered by 
sending txCleanupCmd will be durable because of raftClient inner 
implementation, apparently new topologyAwareRaftClient and special recovery 
procedures on primary re-election that will be covered by separate ticket, so 
nothing to do here.

  was:
h3. Motivation

According to tx commit process design it's required to return the control to 
the outer logic right after COMMITED/ABORTED txn state replication. Follow-up 
cleanup process, that will send replica cleanup requests to all enlisted 
replication groups should be asynchronous.

Currently it's not true:
{code:java}
/**
 * Process transaction finish request:
 * <ol>
 *     <li>Evaluate commit timestamp.</li>
 *     <li>Run specific raft {@code FinishTxCommand} command, that will apply 
txn state to corresponding txStateStorage.</li>
 *     <li>Send cleanup requests to all enlisted primary replicas.</li>
 * </ol>
 * This operation is NOT idempotent, because of commit timestamp evaluation.
 *
 * @param request Transaction finish request.
 * @return future result of the operation.
 */
private CompletableFuture<Object> processTxFinishAction(TxFinishRequest 
request) {
    HybridTimestamp commitTimestamp = hybridClock.now();

    List<String> aggregatedGroupIds = 
request.groups().values().stream().flatMap(List::stream).collect(Collectors.toList());

    UUID txId = request.txId();

    boolean commit = request.commit();

    CompletableFuture<Object> chaneStateFuture = raftClient.run(
            new FinishTxCommand(
                    txId,
                    commit,
                    commitTimestamp,
                    aggregatedGroupIds
            )
    );

    // TODO: https://issues.apache.org/jira/browse/IGNITE-17578
    chaneStateFuture.thenRun(
            () -> request.groups().forEach(
                    (recipientNode, replicationGroupIds) -> txManager.cleanup(
                            recipientNode,
                            replicationGroupIds,
                            txId,
                            commit,
                            commitTimestamp
                    )
            )
    );

    return chaneStateFuture;
}
{code}
Besides aforementioned, it's expected that cleanup process (that is guaranteed 
to be idempotent) should be performed until success.
h3. Definition of Done
 * Sending cleanup request should be implemented in an async format.
 * Cleanup failures, including timeouts should trigger one more cleanup until 
success. There's no failure handler currently, so it's the only option.

h3. Implementation Notes

Seems that, properly shared between replicas, cleanup executor will suite us. 
The executor is needed to have ability to plan the next attempt of cleanup in 
case of failure, so that such attempt would be performed not right after the 
failure but after successful rehashing of replicas when their state allows to 
perform the cleanup attempt with high possibility of success.

 

 


> Transactions: async cleanup processing on tx commit
> ---------------------------------------------------
>
>                 Key: IGNITE-17578
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17578
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Alexander Lapin
>            Priority: Major
>              Labels: ignite-3, transaction3_rw
>
> h3. Motivation
> Within RW transaction commit process, according to the tx design it's 
> required to return the control to the outer logic right after
>  # COMMITED/ABORTED txn state replication
>  # Locks release.
> Follow-up cleanup process, that will apply or remove write intents, should be 
> asynchronous. Currently, it's not true.  It also worth to mention that 
> currently, locks are released after write intent application. That should be 
> inverted. Such enhancements mean that not only RO but also RW transactions 
> may retrieve writeIntent and thus perform writeInentResolution - it's covered 
> with separate ticket https://issues.apache.org/jira/browse/IGNITE-19570 that 
> should be implemented prior to this one. 
> h3. Definition of Done
>  * Write intent application or removal should be implemented in an async 
> format.
>  * Write intent applicatoin and locks release process should be
> h3. Implementation Notes
> Generally it's only required to change some code in 
> org.apache.ignite.internal.table.distributed.replicator.PartitionReplicaListener#processTxCleanupAction
>  
> {code:java}
>     return allOffFuturesExceptionIgnored(txUpdateFutures, 
> request).thenCompose(v -> {
>         TxCleanupCommand txCleanupCmd = MSG_FACTORY.txCleanupCommand()
>                 .txId(request.txId())
>                 .commit(request.commit())
>                 .commitTimestampLong(request.commitTimestampLong())
>                 .safeTimeLong(hybridClock.nowLong())
>                 .build();
>         return raftClient
>                 .run(txCleanupCmd)
>                 .thenCompose(ignored -> 
> allOffFuturesExceptionIgnored(txReadFutures, request)
>                         .thenRun(() -> releaseTxLocks(request.txId())));
>     });
> } {code}
>  
>  * releaseTxLocks priour to sending txCleanupCmd.
>  * send txCleanupCmd in async manner, meaning that processTxCleanupAction 
> should return the result after locks are released.
> It seems that durable writeIntentApplication, the process that is triggered 
> by sending txCleanupCmd will be durable because of raftClient inner 
> implementation, apparently new topologyAwareRaftClient and special recovery 
> procedures on primary re-election that will be covered by separate ticket, so 
> nothing to do here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to