Sevenannn opened a new pull request, #13120:
URL: https://github.com/apache/datafusion/pull/13120

   ## Which issue does this PR close?
   
   
   
   ## Rationale for this change
   
   In ParquetExec, when filter_pushdown is not enabled, predicates are simply 
ignored, causing incorrect results for queries with filters pushed down in 
TableScan.
   
   For example, for the following query that's supposed to return empty results:
   ```SQL
   with tmp as (
    select ss_quantity, 's' sale_type from store_sales)
    select * from tmp where sale_type = 'w';
   ```
   The `predicate=false` in physical plan simply get ignored in `ParquetExec` 
implementation, causing wrong results.
   ```
   
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                                                                
                                                                                
                                                                                
                                                                                
     |
   
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | SubqueryAlias: tmp                                         
                                                                                
                                                                                
                                                                                
                                                                                
     |
   |               |   Projection: store_sales.ss_quantity, Utf8("s") AS 
sale_type                                                                       
                                                                                
                                                                                
                                                                                
            |
   |               |     BytesProcessedNode                                     
                                                                                
                                                                                
                                                                                
                                                                                
     |
   |               |       TableScan: store_sales projection=[ss_quantity], 
full_filters=[Boolean(false) AS Utf8("s") = Utf8("w")]                          
                                                                                
                                                                                
                                                                                
         |
   | physical_plan | ProjectionExec: expr=[ss_quantity@0 as ss_quantity, s as 
sale_type]                                                                      
                                                                                
                                                                                
                                                                                
       |
   |               |   BytesProcessedExec                                       
                                                                                
                                                                                
                                                                                
                                                                                
     |
   |               |     ParquetExec: file_groups={10 groups: 
[[tpcds/store_sales/store_sales.parquet:0..15420768], 
[tpcds/store_sales/store_sales.parquet:15420768..30841536], 
[tpcds/store_sales/store_sales.parquet:30841536..46262304], 
[tpcds/store_sales/store_sales.parquet:46262304..61683072], 
[tpcds/store_sales/store_sales.parquet:61683072..77103840], ...]}, 
projection=[ss_quantity], predicate=false |
   |               |                                                            
                                                                                
                                                                                
                                                                                
                                                                                
     |
   
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   ```
   
   ## What changes are included in this PR?
   
   Changes in this PR will ensure that predicates don't get omitted when 
filter_pushdown is not enabled, including.
   - When filter_pushdown is not enabled, ese predicates to evaluate and filter 
`RecordBatch` returned from parquet.
   - A function to recursively update the column indexes of filter according to 
the schema of `RecordBatch`. 
   
   ## Are these changes tested?
   
   Yes
   
   ## Are there any user-facing changes?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to