[
https://issues.apache.org/jira/browse/IGNITE-22516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman Puchkovskiy resolved IGNITE-22516.
----------------------------------------
Resolution: Invalid
The proposal seems invalid as it doesn't solve the 'lagging time node' problem
for nodes joining after we take a snapshot of the cluster nodes for getting
acks.
> Shorten waiting out clock skew on DDL execution
> -----------------------------------------------
>
> Key: IGNITE-22516
> URL: https://issues.apache.org/jira/browse/IGNITE-22516
> Project: Ignite
> Issue Type: Improvement
> Reporter: Roman Puchkovskiy
> Priority: Major
> Labels: ignite-3
>
> *UPDATE: this proposal seems to be flawed as there doesn't seem to be a
> guarantee that a node validated after item 1 actually sees new schema
> activation as its physical clock might be lagging.*
> After IGNITE-20378 is fixed, a wait after a DDL will be
> DelayDuration+MaxClockSkew. The second component is needed to make sure that
> the new schema activates on each node of the cluster, even if its clock is
> skewed. We can get an explicit ack about new schema activation from each
> cluster node instead of pessimistically waiting out for MaxClockSkew. This
> will allow us to wait less.
> This could look like the following. After we submit the new schema update to
> the Metastorage (and this write gets acked by its majority), we do the
> following:
> # Take the combined set of validated nodes and the logical topology from the
> CMG leader (let it be S)
> # Send a WaitForCatalogVersionActivationRequest(createdCatalogVersion) to
> each node in S
> # (A node getting such a request waits till the give catalog version
> activates on the node and then responds with an ack)
> # Complete the user's DDL future when for each node in S one of the
> following happens:
> ## An ack is received
> ## The node leaves the logical topology
> # As it's already done, still wait for DelayDuration+MaxClockSkew; if this
> wait completes faster than the wait described in items 1-4, it completes the
> user's DDL future
> If the logical topology is stable, this will guarantee that either each node
> acks the activation or DelayDuration+MaxClockSkew passes (which will
> guarantee activation on the whole cluster, given that local clock skews are
> bounded by MaxClockSkew).
> If a node gets validated after we execute item 1, then its validation happens
> after the new schema update is written do the Metastorage; the node does
> Metastorage recovery after validation, hence after the new schema update is
> written to the Metastorage; hence the node will apply the new schema update
> during its recovery, and it will surely see the new schema update before
> becoming operational.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)