Maksim Zhuravkov created IGNITE-21277: -----------------------------------------
Summary: Sql. Partition pruning. Extract partition pruning information from scan operations Key: IGNITE-21277 URL: https://issues.apache.org/jira/browse/IGNITE-21277 Project: Ignite Issue Type: Improvement Components: sql Reporter: Maksim Zhuravkov In order to prune unnecessary partitions we need to obtain information that includes possible "values" of colocation key columns from filter expressions for every scan operators (such as IgniteTableScan, IgniteIndexScan and IgniteSystemViewScan - or simply subclasses of all ProjectableFilterableTableScan) prior to statement execution. This can be accomplished by traversing an expression tree of scan's filter and collecting expressions with colocation key columns (This data is called partition pruning metadata for simplicity). 1. Implement a component that takes a physical plan and analyses filter expressions of every scan operator and creates (if possible) an expression that includes all colocated columns. (The PartitionExtractor from patch can be used a reference implementation). Basic example: {code:java} Statement: SELECT * FROM t WHERE pk = 7 OR pk = 42 Partition metadata: t's source_id = [pk=7 || pk = 42] // Assuming colocation key is equal to primary key, || denotes OR operation {code} If some colocation key columns are missing from then filter, then partition pruning is not possible for such operation. Expression types to analyze: * AND * EQUALS * IS_FALSE * IS_NOT_DISTINCT_FROM * IS_NOT_FALSE * IS_NOT_TRUE * IS_TRUE * NOT * OR * SEARCH (operation that tests whether a value is included in a certain range) 2. Update QueryPlan to include partition pruning metadata for every scan operator (source_id = <expression>). — *Additional examples - partition pruning is possible* Dynamic parameters: {code:java} SELECT * FROM t WHERE pk = ?1 Partition pruning metadata: t = [ pk = ?1 ] {code} Colocation columns reside inside a nested expression: {code:java} SELECT * FROM t WHERE col1 = col2 AND (col2 = 100 AND pk = 2) Partition pruning metadata: t = [ pk = 2 ] {code} Multiple keys: {code:java} SELECT * FROM t WHERE col_c1 = 1 AND col_c2 = 2 Partition pruning metadata: t = [ (col_c1 = 1, col_c2 = 2) ] {code} Complex expression with multiple keys: {code:java} SELECT * FROM t WHERE (col_col1 = 100 and col_col2 = 4) OR (col_col1 = 4 and col_col2 = 100) Partition pruning metadata: t = [ (col_col1 = 100, col_col2 = 4) || (col_col1 = 4, col_col2 = 100) ] {code} Multiple tables, assuming that filter b_id = 42 is pushed into scan b, because a_id = b_id: {code:java} SELECT * FROM a JOIN b WHERE a_id = b_id AND a_id = 42 Partition pruning metadata: a= [ a_id=42 ], b=[ b_id=42 ] {code} --- *Partition pruning is not possible* Columns named col* are not part of colocation key: {code:java} SELECT * FROM t WHERE col1 = 10 Partition pruning metadata: [] // (empty) because filter does not use colocation key columns. {code} {code:java} SELECT * FROM t WHERE col1 = col2 OR pk = 42 // Pruning is not possible because we need to scan all partitions to figure out which tuples have ‘col1 = col2’ Partition pruning metadata: [] {code} {code:java} SELECT * FROM t WHERE col_col1 = 10 AND col_col2 OR col_col1 = 42 // Although first expression uses all colocation key columns the second one only uses some. Partition pruning metadata: [] {code} {code:java} SELECT * FROM t WHERE col_c1 = 1 OR col_c2 = 2 // Empty partition pruning metadata: need to scan all partitions to figure out which tuples have col_c1 = 1 OR col_c2 = 2. Partition pruning metadata: [] {code} {code:java} SELECT * FROM t WHERE col_col1 = col_col2 OR col_col2 = 42 // Empty partition pruning metadata: need to scan all partitions to figure out which tuples have ‘col_col1 = col_col2’ Partition pruning metadata: [] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)