alamb commented on code in PR #18868:
URL: https://github.com/apache/datafusion/pull/18868#discussion_r2600089947


##########
datafusion/optimizer/src/push_down_limit.rs:
##########
@@ -32,12 +33,17 @@ use datafusion_expr::{lit, FetchType, SkipType};
 /// Optimization rule that tries to push down `LIMIT`.
 //. It will push down through projection, limits (taking the smaller limit)
 #[derive(Default, Debug)]
-pub struct PushDownLimit {}
+pub struct PushDownLimit {
+    /// Flag to track whether we're currently under a Sort node that requires 
order preservation

Review Comment:
   I think there is only one `PushDownLimit` instance per `SessionContext`, so 
using `AtomicBool` here i think means that planning multiple concurrent queries 
will interfere with each other (can set this bool back / forth)
   
   I looked around and it seems like all the other logical optimizer rules 
don't push state down either so I don't have a great explanation for how to 
change this



##########
datafusion/core/tests/parquet/mod.rs:
##########
@@ -140,9 +140,10 @@ impl TestOutput {
             })
     }
 
-    fn pruning_metric(&self, metric_name: &str) -> Option<(usize, usize)> {
+    fn pruning_metric(&self, metric_name: &str) -> Option<(usize, usize, 
usize)> {

Review Comment:
   I think it would help to document what these returned values were.
   
   Or maybe return a struct like
   ```rust
   struct PruningMetric {
     total_pruned: usize,
     total_matched: usize, 
     total_fully_matched: usize,
   }
   ```



##########
datafusion/datasource-parquet/src/row_group_filter.rs:
##########
@@ -46,13 +48,19 @@ use parquet::{
 pub struct RowGroupAccessPlanFilter {
     /// which row groups should be accessed
     access_plan: ParquetAccessPlan,
+    /// which row groups are fully contained within the pruning predicate
+    is_fully_matched: Vec<bool>,

Review Comment:
   Sorry I meant add a comment like
   
   ```rust
       /// row groups where ALL rows are known to match the pruning predicate
   ```



##########
datafusion/optimizer/src/push_down_limit.rs:
##########
@@ -32,12 +33,17 @@ use datafusion_expr::{lit, FetchType, SkipType};
 /// Optimization rule that tries to push down `LIMIT`.
 //. It will push down through projection, limits (taking the smaller limit)
 #[derive(Default, Debug)]
-pub struct PushDownLimit {}
+pub struct PushDownLimit {
+    /// Flag to track whether we're currently under a Sort node that requires 
order preservation

Review Comment:
   Looking around more, it seems like maybe @zhuqi-lucas 's new physical 
optimizer rule in
   -  https://github.com/apache/datafusion/pull/19064 
   
   might be a more straightforward way to go
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to