UBarney commented on PR #14567:
URL: https://github.com/apache/datafusion/pull/14567#issuecomment-2656543100

   @alamb 
   This transformation is correct in this case
   
   according to this 
[doc](https://docs.rs/datafusion/latest/datafusion/physical_optimizer/pruning/struct.PruningPredicate.html#contains-analysis-and-minmax-rewrite),
 return True **keep** the container.
   > When the min/max values are actually substituted in to this expression and 
evaluated, the result means
   true: there MAY be rows that pass the predicate, KEEPS the container
   
   Here's end to end test (mannally 😓)
   
   
   ```
   > explain select * from 't.parquet' where c1 not like 'ac%';
   
   
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                                                                
                                                                                
                                                                       |
   
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Filter: t.parquet.c1 NOT LIKE Utf8View("ac%")              
                                                                                
                                                                                
                                                                       |
   |               |   TableScan: t.parquet projection=[c1], 
partial_filters=[t.parquet.c1 NOT LIKE Utf8View("ac%")]                         
                                                                                
                                                                                
          |
   | physical_plan | CoalesceBatchesExec: target_batch_size=8192                
                                                                                
                                                                                
                                                                       |
   |               |   FilterExec: c1@0 NOT LIKE ac%                            
                                                                                
                                                                                
                                                                       |
   |               |     RepartitionExec: partitioning=RoundRobinBatch(24), 
input_partitions=1                                                              
                                                                                
                                                                           |
   |               |       DataSourceExec: file_groups={1 group: 
[[home/lv/code/datafusion/datafusion-cli/t.parquet]]}, projection=[c1], 
file_type=parquet, predicate=c1@0 NOT LIKE ac%, 
pruning_predicate=c1_null_count@2 != row_count@3 AND (c1_min@0 NOT LIKE ac% OR 
c1_max@1 NOT LIKE ac%), required_guarantees=[] |
   |               |                                                            
                                                                                
                                                                                
                                                                       |
   
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   2 row(s) fetched. 
   Elapsed 0.012 seconds.
   > select * from 't.parquet' where c1 not like 'ac%';
   +----+
   | c1 |
   +----+
   | aa |
   | ab |
   +----+
   2 row(s) fetched. 
   Elapsed 0.013 seconds.
   ```
   `t.parquet` in is [zip 
file](https://github.com/user-attachments/files/18784196/t.zip) (github doesn't 
allow upload parquet file 😓)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to