[ 
https://issues.apache.org/jira/browse/IGNITE-24436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Shergalis updated IGNITE-24436:
---------------------------------------
    Description: 
Original problem is described in IGNITE-24368.

*What was done:* if catalog version, relevant at the time of transaction, is 
not available, then using indexes from the latest catalog

 

Could not reproduce original scenario fast enough, created a simple unit-test 
instead. Test could be written in the ticket above

  was:
Original problem is described in IGNITE-24368.

A couple of observations:
 # For write intents (WIs) of committed transactions, indexes are actually not 
relevant for cleanup (they are not used), so we should not obtain them at all
 # For WIs of aborted transactions, if we don't remove the tuple from the 
index, this will not cause any consistency problems, just some garbage will 
remain in the index. The garbage will consume disk space and it will slow down 
reads via the index, but that's it

So the idea of a quick and dirty fix is to do the following when doing write 
intent switch in PartitionReplicaListener and PartitionListener:
 # In StorageUpdateHandler, make last parameter (accepting a list of indexes) a 
Supplier
 # If the transaction is committed, don't call the supplier
 # In the supplier, call not the existing TableUtils#indexIdsAtRwTxBeginTs(), 
but a variant of it (it might be called indexIdsAtRwTxBeginTsOrEmpty()) which 
will not fail if there is no catalog version or it doesn't contain indexes for 
the table, but it would just return an empty list

But first, an integration test has to be written to make sure that analysis 
made in IGNITE-24368 was correct. The scenario (parameterized with whether a 
transaction is committed or rolled back) is:
 # Start a cluster of 3 nodes with dataAvailabilityTime set to 1 second
 # Make sure transaction cleanups do not happen (the interval between cleanups 
can be configured probably)
 # Create a zone with 1 partition (not necessarily, might be default 25, but 
could be easier to debug with just 1) and 3 replicas
 # Create table A in the zone
 # Start an *explicit* transaction, make a put in it, commit/rollback
 # Create table B in the zone (to make the catalog version in which A was 
created not the freshest version)
 # Stop all nodes
 # Wait for dataAvailabilityTime to pass (1 second)
 # Start all 3 nodes and expect the start to fail

After the fix is made, it would be great to also add a test that makes sure 
that if this happens for a rolled-back transaction, after a restart we can 
still make a put of the same key as the one that was rolled-back.


> Do not clean index on tx cleanup if no index info is available
> --------------------------------------------------------------
>
>                 Key: IGNITE-24436
>                 URL: https://issues.apache.org/jira/browse/IGNITE-24436
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Roman Puchkovskiy
>            Assignee: Philipp Shergalis
>            Priority: Major
>              Labels: ignite-3
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Original problem is described in IGNITE-24368.
> *What was done:* if catalog version, relevant at the time of transaction, is 
> not available, then using indexes from the latest catalog
>  
> Could not reproduce original scenario fast enough, created a simple unit-test 
> instead. Test could be written in the ticket above



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to