[ https://issues.apache.org/jira/browse/IGNITE-24926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Roman Puchkovskiy updated IGNITE-24926: --------------------------------------- Description: We initiate index registration in Metastore notification chain; this triggers index storages creation (asynchronously). Also, when a partition is evicted from a node (also triggered by an event coming from Metastore notification chain), this triggers table partition storage destruction; this also happens asynchronously as it requires I/O. These 2 kinds of activities must be linearized, and the order must be defined by the order of the triggering Metastore events. The problem is that this linearization is not guaranteed. The following might happen: # Index A is created and registered, its storages start to be created for every partition current node hosts (of the corresponding table) # Partition P of the corresponding table is evicted, its storage is destroyed # Only now index storage creation for partition P is executed, but the partition storage is already destroyed We need to introduce a linearization here, keeping in mind that the following must still be maintained: # Creation of table storage of partition P must precede its destruction (currently maintained) # Creation of index storage of partition P must precede its destruction (currently seems to be maintained) # Creation of index storage (and its registration in internal data structures) must precede writes to the index storage (currently seems to be maintained) Also, the following aspects affect performance and liveness: # Partition replica lifecycle management should not block Metastore notification chain (that is, async processes caused by partition replica lifecycle events should not hold Metastore revision completion; they need to be asynchronous wrt Metastore), otherwise deadlocks will occur (this is currently maintained) # Index storage registration should not hold Metastore revision completion (this is currently NOT maintained; this negatively affects Metastore performance, including Metastore SafeTime propagation, which might affect transaction processing) This seems to require a thorough design. was: We initiate index registration in Metastore notification chain; this triggers index storages creation (asynchronously). Also, when a partition is evicted from a node (also triggered by an event coming from Metastore notification chain), this triggers table partition storage destruction; this also happens asynchronously as it requires I/O. These 2 kinds of activities must be linearized, and the order must be defined by the order of the triggering Metastore events. The problem is that this linearization is not guaranteed. The following might happen: # Index A is created and registered, its storages start to be created for every partition current node hosts (of the corresponding table) # Partition P of the corresponding table is evicted, its storage is destroyed # Only now index storage creation for partition P is executed, but the partition storage is already destroyed We need to introduce a linearization here, keeping in mind that the following must still be maintained: # Creation of table storage of partition P must precede its destruction (currently maintained) # Creation of index storage of partition P must precede its destruction (currently seems to be maintained) # Creation of index storage (and its registration in internal data structures) must precede writes to the index storage (currently seems to be maintained) Also, the following aspects affect performance and liveness: # Partition replica lifecycle management should not block Metastore notification chain (that is, async processes caused by partition replica lifecycle events should not hold Metastore revision completion; they need to be asynchronous wrt Metastore), otherwise deadlocks will occur (this is currently maintained) # Index storage registration should not hold Metastore revision completion (this is currently NOT maintained; this negatively affects Metastore performance, including Metastore SafeTime propagation, which might affect transaction processing) # Index storages should beĀ > Race between index registration and replica destruction > ------------------------------------------------------- > > Key: IGNITE-24926 > URL: https://issues.apache.org/jira/browse/IGNITE-24926 > Project: Ignite > Issue Type: Bug > Reporter: Roman Puchkovskiy > Assignee: Roman Puchkovskiy > Priority: Major > Labels: ignite-3 > > We initiate index registration in Metastore notification chain; this triggers > index storages creation (asynchronously). Also, when a partition is evicted > from a node (also triggered by an event coming from Metastore notification > chain), this triggers table partition storage destruction; this also happens > asynchronously as it requires I/O. These 2 kinds of activities must be > linearized, and the order must be defined by the order of the triggering > Metastore events. > The problem is that this linearization is not guaranteed. The following might > happen: > # Index A is created and registered, its storages start to be created for > every partition current node hosts (of the corresponding table) > # Partition P of the corresponding table is evicted, its storage is destroyed > # Only now index storage creation for partition P is executed, but the > partition storage is already destroyed > We need to introduce a linearization here, keeping in mind that the following > must still be maintained: > # Creation of table storage of partition P must precede its destruction > (currently maintained) > # Creation of index storage of partition P must precede its destruction > (currently seems to be maintained) > # Creation of index storage (and its registration in internal data > structures) must precede writes to the index storage (currently seems to be > maintained) > Also, the following aspects affect performance and liveness: > # Partition replica lifecycle management should not block Metastore > notification chain (that is, async processes caused by partition replica > lifecycle events should not hold Metastore revision completion; they need to > be asynchronous wrt Metastore), otherwise deadlocks will occur (this is > currently maintained) > # Index storage registration should not hold Metastore revision completion > (this is currently NOT maintained; this negatively affects Metastore > performance, including Metastore SafeTime propagation, which might affect > transaction processing) > This seems to require a thorough design. -- This message was sent by Atlassian Jira (v8.20.10#820010)