[ 
https://issues.apache.org/jira/browse/IGNITE-24926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Puchkovskiy updated IGNITE-24926:
---------------------------------------
    Description: 
We initiate index registration in Metastore notification chain; this triggers 
index storages creation (asynchronously). Also, when a partition is evicted 
from a node (also triggered by an event coming from Metastore notification 
chain), this triggers table partition storage destruction; this also happens 
asynchronously as it requires I/O. These 2 kinds of activities must be 
linearized, and the order must be defined by the order of the triggering 
Metastore events.

The problem is that this linearization is not guaranteed. The following might 
happen:
 # Index A is created and registered, its storages start to be created for 
every partition current node hosts (of the corresponding table)
 # Partition P of the corresponding table is evicted, its storage is destroyed
 # Only now index storage creation for partition P is executed, but the 
partition storage is already destroyed

We need to introduce a linearization here, keeping in mind that the following 
must still be maintained:
 # Creation of table storage of partition P must precede its destruction 
(currently maintained)
 # Creation of index storage of partition P must precede its destruction 
(currently seems to be maintained)
 # Creation of index storage (and its registration in internal data structures) 
must precede writes to the index storage (currently seems to be maintained)

Also, the following aspects affect performance and liveness:
 # Partition replica lifecycle management should not block Metastore 
notification chain (that is, async processes caused by partition replica 
lifecycle events should not hold Metastore revision completion; they need to be 
asynchronous wrt Metastore), otherwise deadlocks will occur (this is currently 
maintained)
 # Index storage registration should not hold Metastore revision completion 
(this is currently NOT maintained; this negatively affects Metastore 
performance, including Metastore SafeTime propagation, which might affect 
transaction processing)

This seems to require a thorough design.

  was:
We initiate index registration in Metastore notification chain; this triggers 
index storages creation (asynchronously). Also, when a partition is evicted 
from a node (also triggered by an event coming from Metastore notification 
chain), this triggers table partition storage destruction; this also happens 
asynchronously as it requires I/O. These 2 kinds of activities must be 
linearized, and the order must be defined by the order of the triggering 
Metastore events.

The problem is that this linearization is not guaranteed. The following might 
happen:
 # Index A is created and registered, its storages start to be created for 
every partition current node hosts (of the corresponding table)
 # Partition P of the corresponding table is evicted, its storage is destroyed
 # Only now index storage creation for partition P is executed, but the 
partition storage is already destroyed

We need to introduce a linearization here, keeping in mind that the following 
must still be maintained:
 # Creation of table storage of partition P must precede its destruction 
(currently maintained)
 # Creation of index storage of partition P must precede its destruction 
(currently seems to be maintained)
 # Creation of index storage (and its registration in internal data structures) 
must precede writes to the index storage (currently seems to be maintained)

Also, the following aspects affect performance and liveness:
 # Partition replica lifecycle management should not block Metastore 
notification chain (that is, async processes caused by partition replica 
lifecycle events should not hold Metastore revision completion; they need to be 
asynchronous wrt Metastore), otherwise deadlocks will occur (this is currently 
maintained)
 # Index storage registration should not hold Metastore revision completion 
(this is currently NOT maintained; this negatively affects Metastore 
performance, including Metastore SafeTime propagation, which might affect 
transaction processing)
 # Index storages should beĀ 


> Race between index registration and replica destruction
> -------------------------------------------------------
>
>                 Key: IGNITE-24926
>                 URL: https://issues.apache.org/jira/browse/IGNITE-24926
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Roman Puchkovskiy
>            Assignee: Roman Puchkovskiy
>            Priority: Major
>              Labels: ignite-3
>
> We initiate index registration in Metastore notification chain; this triggers 
> index storages creation (asynchronously). Also, when a partition is evicted 
> from a node (also triggered by an event coming from Metastore notification 
> chain), this triggers table partition storage destruction; this also happens 
> asynchronously as it requires I/O. These 2 kinds of activities must be 
> linearized, and the order must be defined by the order of the triggering 
> Metastore events.
> The problem is that this linearization is not guaranteed. The following might 
> happen:
>  # Index A is created and registered, its storages start to be created for 
> every partition current node hosts (of the corresponding table)
>  # Partition P of the corresponding table is evicted, its storage is destroyed
>  # Only now index storage creation for partition P is executed, but the 
> partition storage is already destroyed
> We need to introduce a linearization here, keeping in mind that the following 
> must still be maintained:
>  # Creation of table storage of partition P must precede its destruction 
> (currently maintained)
>  # Creation of index storage of partition P must precede its destruction 
> (currently seems to be maintained)
>  # Creation of index storage (and its registration in internal data 
> structures) must precede writes to the index storage (currently seems to be 
> maintained)
> Also, the following aspects affect performance and liveness:
>  # Partition replica lifecycle management should not block Metastore 
> notification chain (that is, async processes caused by partition replica 
> lifecycle events should not hold Metastore revision completion; they need to 
> be asynchronous wrt Metastore), otherwise deadlocks will occur (this is 
> currently maintained)
>  # Index storage registration should not hold Metastore revision completion 
> (this is currently NOT maintained; this negatively affects Metastore 
> performance, including Metastore SafeTime propagation, which might affect 
> transaction processing)
> This seems to require a thorough design.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to