[ 
https://issues.apache.org/jira/browse/IGNITE-22516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Puchkovskiy reopened IGNITE-22516:
----------------------------------------

> Shorten waiting out clock skew on DDL execution
> -----------------------------------------------
>
>                 Key: IGNITE-22516
>                 URL: https://issues.apache.org/jira/browse/IGNITE-22516
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Roman Puchkovskiy
>            Priority: Major
>              Labels: ignite-3
>
> *UPDATE: this proposal seems to be flawed as there doesn't seem to be a 
> guarantee that a node validated after item 1 actually sees new schema 
> activation as its physical clock might be lagging.*
> After IGNITE-20378 is fixed, a wait after a DDL will be 
> DelayDuration+MaxClockSkew. The second component is needed to make sure that 
> the new schema activates on each node of the cluster, even if its clock is 
> skewed. We can get an explicit ack about new schema activation from each 
> cluster node instead of pessimistically waiting out for MaxClockSkew. This 
> will allow us to wait less.
> This could look like the following. After we submit the new schema update to 
> the Metastorage (and this write gets acked by its majority), we do the 
> following:
>  # Take the combined set of validated nodes and the logical topology from the 
> CMG leader (let it be S)
>  # Send a WaitForCatalogVersionActivationRequest(createdCatalogVersion) to 
> each node in S
>  # (A node getting such a request waits till the give catalog version 
> activates on the node and then responds with an ack)
>  # Complete the user's DDL future when for each node in S one of the 
> following happens:
>  ## An ack is received
>  ## The node leaves the logical topology
>  # As it's already done, still wait for DelayDuration+MaxClockSkew; if this 
> wait completes faster than the wait described in items 1-4, it completes the 
> user's DDL future
> If the logical topology is stable, this will guarantee that either each node 
> acks the activation or DelayDuration+MaxClockSkew passes (which will 
> guarantee activation on the whole cluster, given that local clock skews are 
> bounded by MaxClockSkew).
> If a node gets validated after we execute item 1, then its validation happens 
> after the new schema update is written do the Metastorage; the node does 
> Metastorage recovery after validation, hence after the new schema update is 
> written to the Metastorage; hence the node will apply the new schema update 
> during its recovery, and it will surely see the new schema update before 
> becoming operational.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to