alamb commented on PR #14119:
URL: https://github.com/apache/datafusion/pull/14119#issuecomment-2608310203

   This is really sweet. You an see it working to prune parquet files here:
   
   ```
   > copy (values ('foo'), ('bar'), ('baz')) to '/tmp/foo.parquet' STORED AS 
parquet;
   +-------+
   | count |
   +-------+
   | 3     |
   +-------+
   1 row(s) fetched.
   Elapsed 0.010 seconds.
   
   > explain select * from '/tmp/foo.parquet' where starts_with(column1, 'f');
   
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                                                                
                                                                                
                           |
   
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Filter: /tmp/foo.parquet.column1 LIKE Utf8View("f%")       
                                                                                
                                                                                
                           |
   |               |   TableScan: /tmp/foo.parquet projection=[column1], 
partial_filters=[/tmp/foo.parquet.column1 LIKE Utf8View("f%")]                  
                                                                                
                                  |
   | physical_plan | CoalesceBatchesExec: target_batch_size=8192                
                                                                                
                                                                                
                           |
   |               |   FilterExec: column1@0 LIKE f%                            
                                                                                
                                                                                
                           |
   |               |     RepartitionExec: partitioning=RoundRobinBatch(16), 
input_partitions=1                                                              
                                                                                
                               |
   |               |       ParquetExec: file_groups={1 group: 
[[tmp/foo.parquet]]}, projection=[column1], predicate=column1@0 LIKE f%, 
pruning_predicate=column1_null_count@2 != column1_row_count@3 AND column1_min@0 
<= g AND f <= column1_max@1, required_guarantees=[] |
   |               |                                                            
                                                                                
                                                                                
                           |
   
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   2 row(s) fetched.
   Elapsed 0.019 seconds.
   ```
   
   Specifically the predicate `AND column1_min@0 <= g AND f <= column1_max@1` 
shows it has translated the like into a min/max range on column_1 🤯 
   
   I will also add a test to this PR demonstrating this too
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to