alamb commented on PR #14119: URL: https://github.com/apache/datafusion/pull/14119#issuecomment-2608310203
This is really sweet. You an see it working to prune parquet files here: ``` > copy (values ('foo'), ('bar'), ('baz')) to '/tmp/foo.parquet' STORED AS parquet; +-------+ | count | +-------+ | 3 | +-------+ 1 row(s) fetched. Elapsed 0.010 seconds. > explain select * from '/tmp/foo.parquet' where starts_with(column1, 'f'); +---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | logical_plan | Filter: /tmp/foo.parquet.column1 LIKE Utf8View("f%") | | | TableScan: /tmp/foo.parquet projection=[column1], partial_filters=[/tmp/foo.parquet.column1 LIKE Utf8View("f%")] | | physical_plan | CoalesceBatchesExec: target_batch_size=8192 | | | FilterExec: column1@0 LIKE f% | | | RepartitionExec: partitioning=RoundRobinBatch(16), input_partitions=1 | | | ParquetExec: file_groups={1 group: [[tmp/foo.parquet]]}, projection=[column1], predicate=column1@0 LIKE f%, pruning_predicate=column1_null_count@2 != column1_row_count@3 AND column1_min@0 <= g AND f <= column1_max@1, required_guarantees=[] | | | | +---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 2 row(s) fetched. Elapsed 0.019 seconds. ``` Specifically the predicate `AND column1_min@0 <= g AND f <= column1_max@1` shows it has translated the like into a min/max range on column_1 🤯 I will also add a test to this PR demonstrating this too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org