UBarney commented on PR #14567: URL: https://github.com/apache/datafusion/pull/14567#issuecomment-2656543100
@alamb This transformation is correct in this case according to this [doc](https://docs.rs/datafusion/latest/datafusion/physical_optimizer/pruning/struct.PruningPredicate.html#contains-analysis-and-minmax-rewrite), return True **keep** the container. > When the min/max values are actually substituted in to this expression and evaluated, the result means true: there MAY be rows that pass the predicate, KEEPS the container Here's end to end test (mannally 😓) ``` > explain select * from 't.parquet' where c1 not like 'ac%'; +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | logical_plan | Filter: t.parquet.c1 NOT LIKE Utf8View("ac%") | | | TableScan: t.parquet projection=[c1], partial_filters=[t.parquet.c1 NOT LIKE Utf8View("ac%")] | | physical_plan | CoalesceBatchesExec: target_batch_size=8192 | | | FilterExec: c1@0 NOT LIKE ac% | | | RepartitionExec: partitioning=RoundRobinBatch(24), input_partitions=1 | | | DataSourceExec: file_groups={1 group: [[home/lv/code/datafusion/datafusion-cli/t.parquet]]}, projection=[c1], file_type=parquet, predicate=c1@0 NOT LIKE ac%, pruning_predicate=c1_null_count@2 != row_count@3 AND (c1_min@0 NOT LIKE ac% OR c1_max@1 NOT LIKE ac%), required_guarantees=[] | | | | +---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 2 row(s) fetched. Elapsed 0.012 seconds. > select * from 't.parquet' where c1 not like 'ac%'; +----+ | c1 | +----+ | aa | | ab | +----+ 2 row(s) fetched. Elapsed 0.013 seconds. ``` `t.parquet` in is [zip file](https://github.com/user-attachments/files/18784196/t.zip) (github doesn't allow upload parquet file 😓) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org