[
https://issues.apache.org/jira/browse/IGNITE-28199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Maksim Zhuravkov updated IGNITE-28199:
--------------------------------------
Description:
Current implementation of partition pruning (PP) collection algorithm does
collect metadata for DML statements that reference multiple sources (see
examples) or have nested queries. This is a limitation is result of current
implementation of the algorithm that has two separate paths for traversing rel
node trees - a path for queries (PartitionPruningMetadataExtractor is also a
visitor) and a path for DMLs(ModifyNodeVisitor). The path for DMLs is very
conservative and it rejects many valid cases.
{noformat}
-- These statements have two sources each - a source for ModifyNode and another
source for ScanNode, FunctionScan `breaks` traversal that collects metadata, so
resulting metadata is absent:
--- expected: t(UPDATE)={id=1}, t(SELECT)={id=1}
UPDATE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM SYSTEM_RANGE(1, 100))
--- expected: t(DELETE)={id=1}, t(SELECT)={id=1}
DELETE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM SYSTEM_RANGE(1, 100))
--- Does not capture metadata because it has a nested query:
--- expected: t(UPDATE)={id=1}, t(SELECT)={id=1}, t2=1,215 t2={id=42}
UPDATE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM t2 WHERE id = 42)
{noformat}
*Proposed solution*:
Implement unified bottom-up traversal that in addition to collecting metadata
from ScanNodeS, propagates collected metadata up to ModifyNodeS.
was:
Current implementation of partition pruning (PP) collection algorithm does
collect metadata for DML statements that reference multiple sources (see
examples) or have nested queries. This is a limitation is result of current
implementation of the algorithm that has two separate paths for traversing rel
node trees - a path for queries (PartitionPruningMetadataExtractor is also a
visitor) and a path for DMLs(ModifyNodeVisitor). The path for DMLs is very
conservative and it rejects many valid cases.
{noformat}
-- These statements have two sources each - a source for ModifyNode and another
source for ScanNode, FunctionScan `breaks` traversal that collects metadata, so
resulting metadata is absent:
--- expected: t(UPDATE)={id=1}, t(SELECT)={id=1}
UPDATE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM SYSTEM_RANGE(1, 100))
--- expected: t(DELETE)={id=1}, t(SELECT)={id=1}
DELETE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM SYSTEM_RANGE(1, 100))
--- Does not capture metadata because it has a nested query:
--- expected: t(UPDATE)={id=1}, t(SELECT)={id=1}, t2=1,215 t2={id=42}
UPDATE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM t2 WHERE id = 42)
{noformat}
*Proposed solution*:
Implement unified bottom-up traversal that in addition to collecting metadata
from ScanNodeS, propagates that up to ModifyNodeS.
> Sql. Partition pruning. Single bottom-up traversal for both queries and DMLs
> ----------------------------------------------------------------------------
>
> Key: IGNITE-28199
> URL: https://issues.apache.org/jira/browse/IGNITE-28199
> Project: Ignite
> Issue Type: Improvement
> Components: sql ai3
> Reporter: Maksim Zhuravkov
> Priority: Major
> Labels: ignite-3
>
> Current implementation of partition pruning (PP) collection algorithm does
> collect metadata for DML statements that reference multiple sources (see
> examples) or have nested queries. This is a limitation is result of current
> implementation of the algorithm that has two separate paths for traversing
> rel node trees - a path for queries (PartitionPruningMetadataExtractor is
> also a visitor) and a path for DMLs(ModifyNodeVisitor). The path for DMLs is
> very conservative and it rejects many valid cases.
> {noformat}
> -- These statements have two sources each - a source for ModifyNode and
> another source for ScanNode, FunctionScan `breaks` traversal that collects
> metadata, so resulting metadata is absent:
> --- expected: t(UPDATE)={id=1}, t(SELECT)={id=1}
> UPDATE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM SYSTEM_RANGE(1, 100))
> --- expected: t(DELETE)={id=1}, t(SELECT)={id=1}
> DELETE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM SYSTEM_RANGE(1, 100))
> --- Does not capture metadata because it has a nested query:
> --- expected: t(UPDATE)={id=1}, t(SELECT)={id=1}, t2=1,215 t2={id=42}
> UPDATE t SET c1=100 WHERE id=1 and c2 IN (SELECT * FROM t2 WHERE id = 42)
> {noformat}
> *Proposed solution*:
> Implement unified bottom-up traversal that in addition to collecting metadata
> from ScanNodeS, propagates collected metadata up to ModifyNodeS.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)