[ https://issues.apache.org/jira/browse/IGNITE-24346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Roman Puchkovskiy updated IGNITE-24346: --------------------------------------- Description: Tx finish records stored in tx state storage have to be retained until all write intents corresponding to the finish record's transaction are cleaned up (either converted to normal tuple versions or removed). It is possible that, in the current implementation, we might erase such records too early when destroying a dropped table. The suspected scenario is: # tx1 is started, it writes to tables A and B, the it gets committed, commit partition is a partition of A # Cleanup of tx1 is deferred for some reason (like high amount of work the cleanuper does) # A is dropped # After the moment of A's drop sinks under LWM, storages of A's partitions get destroyed (including the commit partition of tx1) # Cleanup of tx1 is attempted, but the tx state storage is destroyed, so the cleanup cannot proceedĀ -> write intents of tx1 will remain forever unresolved We need to check whether this is a possible scenario and, if it's possible, make sure that we keep tx state storages available for cleanup activities until all their transactions get cleaned up, even if this means deferring table partition destruction. For per-zone tx state storages to which we are currently switching (see IGNITE-22621), nothing changes: a transaction might span multiple zones and then the zone hosting the commit partition might have been dropped and destroyed (IGNITE-24345). > Do not destroy tx state storage while its content might be needed for write > intent resolution > --------------------------------------------------------------------------------------------- > > Key: IGNITE-24346 > URL: https://issues.apache.org/jira/browse/IGNITE-24346 > Project: Ignite > Issue Type: Improvement > Reporter: Roman Puchkovskiy > Priority: Major > Labels: ignite-3 > > Tx finish records stored in tx state storage have to be retained until all > write intents corresponding to the finish record's transaction are cleaned up > (either converted to normal tuple versions or removed). It is possible that, > in the current implementation, we might erase such records too early when > destroying a dropped table. > The suspected scenario is: > # tx1 is started, it writes to tables A and B, the it gets committed, commit > partition is a partition of A > # Cleanup of tx1 is deferred for some reason (like high amount of work the > cleanuper does) > # A is dropped > # After the moment of A's drop sinks under LWM, storages of A's partitions > get destroyed (including the commit partition of tx1) > # Cleanup of tx1 is attempted, but the tx state storage is destroyed, so the > cleanup cannot proceedĀ -> write intents of tx1 will remain forever unresolved > We need to check whether this is a possible scenario and, if it's possible, > make sure that we keep tx state storages available for cleanup activities > until all their transactions get cleaned up, even if this means deferring > table partition destruction. > For per-zone tx state storages to which we are currently switching (see > IGNITE-22621), nothing changes: a transaction might span multiple zones and > then the zone hosting the commit partition might have been dropped and > destroyed (IGNITE-24345). -- This message was sent by Atlassian Jira (v8.20.10#820010)