[ https://issues.apache.org/jira/browse/IGNITE-24436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Roman Puchkovskiy updated IGNITE-24436: --------------------------------------- Description: Original problem is described in IGNITE-24368. A couple of observations: # For write intents (WIs) of committed transactions, indexes are actually not relevant for cleanup (they are not used), so we should not obtain them at all # For WIs of aborted transactions, if we don't remove the tuple from the index, this will not cause any consistency problems, just some garbage will remain in the index. The garbage will consume disk space and it will slow down reads via the index, but that's it So the idea of a quick and dirty fix is to do the following when doing write intent switch in PartitionReplicaListener and PartitionListener: # In StorageUpdateHandler, make last parameter (accepting a list of indexes) a Supplier # If the transaction is committed, don't call the supplier # In the supplier, call not the existing TableUtils#indexIdsAtRwTxBeginTs(), but a variant of it (it might be called indexIdsAtRwTxBeginTsOrEmpty()) which will not fail if there is no catalog version or it doesn't contain indexes for the table, but it would just return an empty list But first, an integration test has to be written to make sure that analysis made in IGNITE-24368 was correct. The scenario is: # Start a cluster of 3 nodes with dataAvailabilityTime set to 1 second # Make sure transaction cleanups do not happen (the interval between cleanups can be configured probably) # Create a zone with 1 partition (not necessarily, might be default 25, but could be easier to debug with just 1) and 3 replicas # Create table A in the zone # Start an *explicit* transaction, make a put in it, commit the transaction # Create table B in the zone (to make the catalog version in which A was created not the freshest version) # Stop all nodes # Wait for dataAvailabilityTime to pass (1 second) # Start all 3 nodes and expect the start to fail was: Original problem is described in IGNITE-24368. A couple of observations: # For write intents (WIs) of committed transactions, indexes are actually not relevant for cleanup (they are not used), so we should not obtain them at all # For WIs of aborted transactions, if we don't remove the tuple from the index, this will not cause any consistency problems, just some garbage will remain in the index. The garbage will consume disk space and it will slow down reads via the index, but that's it So the idea of a quick and dirty fix is to do the following when doing write intent switch in PartitionReplicaListener and PartitionListener: # In StorageUpdateHandler, make last parameter (accepting a list of indexes) a Supplier # If the transaction is committed, don't call the supplier # In the supplier, call not the existing TableUtils#indexIdsAtRwTxBeginTs(), but a variant of it (it might be called indexIdsAtRwTxBeginTsOrEmpty()) which will not fail if there is no catalog version or it doesn't contain indexes for the table, but it would just return an empty list But first, an integration test has to be written to make sure that analysis made in IGNITE-24368 was correct. The scenario is: # Start a cluster of 3 nodes with dataAvailabilityTime set to 1 second # Make sure transaction cleanups do not happen (the interval between cleanups can be configured probably) # Create a zone with 1 partition (not necessarily, might be default 25, but could be easier to debug with just 1) and 3 replicas # Create table A in the zone # Start an *explicit* transaction, make a put in it, commit the transaction # Stop all nodes # Wait for dataAvailabilityTime to pass (1 second) # Start all 3 nodes and expect the start to fail > Do not clean index on tx cleanup if no index info is available > -------------------------------------------------------------- > > Key: IGNITE-24436 > URL: https://issues.apache.org/jira/browse/IGNITE-24436 > Project: Ignite > Issue Type: Improvement > Reporter: Roman Puchkovskiy > Priority: Major > Labels: ignite-3 > > Original problem is described in IGNITE-24368. > A couple of observations: > # For write intents (WIs) of committed transactions, indexes are actually > not relevant for cleanup (they are not used), so we should not obtain them at > all > # For WIs of aborted transactions, if we don't remove the tuple from the > index, this will not cause any consistency problems, just some garbage will > remain in the index. The garbage will consume disk space and it will slow > down reads via the index, but that's it > So the idea of a quick and dirty fix is to do the following when doing write > intent switch in PartitionReplicaListener and PartitionListener: > # In StorageUpdateHandler, make last parameter (accepting a list of indexes) > a Supplier > # If the transaction is committed, don't call the supplier > # In the supplier, call not the existing TableUtils#indexIdsAtRwTxBeginTs(), > but a variant of it (it might be called indexIdsAtRwTxBeginTsOrEmpty()) which > will not fail if there is no catalog version or it doesn't contain indexes > for the table, but it would just return an empty list > But first, an integration test has to be written to make sure that analysis > made in IGNITE-24368 was correct. The scenario is: > # Start a cluster of 3 nodes with dataAvailabilityTime set to 1 second > # Make sure transaction cleanups do not happen (the interval between > cleanups can be configured probably) > # Create a zone with 1 partition (not necessarily, might be default 25, but > could be easier to debug with just 1) and 3 replicas > # Create table A in the zone > # Start an *explicit* transaction, make a put in it, commit the transaction > # Create table B in the zone (to make the catalog version in which A was > created not the freshest version) > # Stop all nodes > # Wait for dataAvailabilityTime to pass (1 second) > # Start all 3 nodes and expect the start to fail -- This message was sent by Atlassian Jira (v8.20.10#820010)