xudong963 commented on code in PR #18868:
URL: https://github.com/apache/datafusion/pull/18868#discussion_r2671091634
##########
datafusion/datasource-parquet/src/row_group_filter.rs:
##########
@@ -153,6 +218,68 @@ impl RowGroupAccessPlanFilter {
}
}
+ /// Identifies row groups that are fully matched by the predicate.
+ ///
+ /// This optimization checks whether all rows in a row group satisfy the
predicate
+ /// by inverting the predicate and checking if it prunes the row group. If
the
+ /// inverted predicate prunes a row group, it means no rows match the
inverted
+ /// predicate, which implies all rows match the original predicate.
+ ///
+ /// Note: This optimization is relatively inexpensive for a limited number
of row groups.
+ fn identify_fully_matched_row_groups(
+ &mut self,
+ candidate_row_group_indices: &[usize],
+ arrow_schema: &Schema,
+ parquet_schema: &SchemaDescriptor,
+ groups: &[RowGroupMetaData],
+ predicate: &PruningPredicate,
+ metrics: &ParquetFileMetrics,
+ ) {
+ if candidate_row_group_indices.is_empty() {
+ return;
+ }
+
+ // Use NotExpr to create the inverted predicate
+ let inverted_expr =
Arc::new(NotExpr::new(Arc::clone(predicate.orig_expr())));
+
+ // Simplify the NOT expression (e.g., NOT(c1 = 0) -> c1 != 0)
+ // before building the pruning predicate
+ let simplifier = PhysicalExprSimplifier::new(arrow_schema);
Review Comment:
@adriangb The `predicate_expr` inside `PruningPredicate` is already
**rewritten** in terms of min/max statistics, not the original expression. For
example:
- Original expression: `col > 5`
- Rewritten predicate_expr: `col_max > 5`
If we simply negate the rewritten expression:
- Wrong negation: `NOT(col_max > 5)` → `col_max <= 5`
This semantics is incorrect
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]