[jira] [Created] (IGNITE-21277) Sql. Partition pruning. Extract partition pruning information from scan operations

Maksim Zhuravkov (Jira) Tue, 16 Jan 2024 23:21:06 -0800

Maksim Zhuravkov created IGNITE-21277:
-----------------------------------------


             Summary: Sql. Partition pruning. Extract partition pruning 
information from scan operations
                 Key: IGNITE-21277
                 URL: https://issues.apache.org/jira/browse/IGNITE-21277
             Project: Ignite
          Issue Type: Improvement
          Components: sql
            Reporter: Maksim Zhuravkov


In order to prune unnecessary partitions we need to obtain information that 
includes possible "values" of colocation key columns from filter expressions 
for every scan operators (such as IgniteTableScan, IgniteIndexScan and 
IgniteSystemViewScan - or simply subclasses of all 
ProjectableFilterableTableScan) prior to statement execution. This can be 
accomplished by traversing an expression tree of scan's filter and collecting 
expressions with colocation key columns (This data is called partition pruning 
metadata for simplicity).

1. Implement a component that takes a physical plan and analyses filter 
expressions of every scan operator and creates (if possible) an expression that 
includes all colocated columns. (The PartitionExtractor from patch can be used 
a reference implementation).

Basic example:

 
{code:java}
Statement: 
SELECT * FROM t WHERE pk = 7 OR pk = 42

Partition metadata: 
t's source_id = [pk=7 || pk = 42] // Assuming colocation key is equal to 
primary key, || denotes OR operation 
{code}
 

If some colocation key columns are missing from then filter, then partition 
pruning is not possible for such operation.

Expression types to analyze:
 * AND
 * EQUALS
 * IS_FALSE
 * IS_NOT_DISTINCT_FROM
 * IS_NOT_FALSE
 * IS_NOT_TRUE
 * IS_TRUE
 * NOT
 * OR
 * SEARCH (operation that tests whether a value is included in a certain range)

2. Update QueryPlan to include partition pruning metadata for every scan 
operator (source_id = <expression>).

—

*Additional examples - partition pruning is possible*

Dynamic parameters:

{code:java}
SELECT * FROM t WHERE pk = ?1 
Partition pruning metadata: t = [ pk = ?1 ]
{code}

Colocation columns reside inside a nested expression:

{code:java}
SELECT * FROM t WHERE col1 = col2 AND (col2 = 100 AND pk = 2) 
Partition pruning metadata: t = [ pk = 2 ]
{code}

Multiple keys:

{code:java}
SELECT * FROM t WHERE col_c1 = 1 AND col_c2 = 2 
Partition pruning metadata:  t = [ (col_c1 = 1, col_c2 = 2) ]
{code}

Complex expression with multiple keys:

{code:java}
SELECT * FROM t WHERE (col_col1 = 100 and col_col2 = 4) OR (col_col1 = 4 and 
col_col2 = 100)
Partition pruning metadata: t = [ (col_col1 = 100, col_col2 = 4) || (col_col1 = 
4, col_col2 = 100) ]
{code}

Multiple tables, assuming that filter b_id = 42 is pushed into scan b, because 
a_id = b_id:

{code:java}
SELECT * FROM a JOIN b WHERE a_id = b_id AND a_id = 42 
Partition pruning metadata: a= [ a_id=42 ], b=[ b_id=42 ]
{code}

---

*Partition pruning is not possible*

Columns named col* are not part of colocation key:

{code:java}
SELECT * FROM t WHERE col1 = 10 
Partition pruning metadata: [] // (empty) because filter does not use 
colocation key columns.
{code}

{code:java}
SELECT * FROM t WHERE col1 = col2 OR pk = 42 
// Pruning is not possible because we need to scan all partitions to figure out 
which tuples have ‘col1 = col2’
Partition pruning metadata: [] 
{code}

{code:java}
SELECT * FROM t WHERE col_col1 = 10 AND col_col2 OR col_col1 = 42
// Although first expression uses all colocation key columns the second one 
only uses some.
Partition pruning metadata: [] 
{code}

{code:java}
SELECT * FROM t WHERE col_c1 = 1 OR col_c2 = 2 
// Empty partition pruning metadata:  need to scan all partitions to figure out 
which tuples have col_c1 = 1 OR col_c2 = 2.
Partition pruning metadata: [] 
{code}

{code:java}
SELECT * FROM t WHERE col_col1 = col_col2 OR col_col2 = 42 
// Empty partition pruning metadata: need to scan all partitions to figure out 
which tuples have ‘col_col1 = col_col2’
Partition pruning metadata: [] 
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IGNITE-21277) Sql. Partition pruning. Extract partition pruning information from scan operations

Reply via email to