berkaysynnada commented on code in PR #14813:
URL: https://github.com/apache/datafusion/pull/14813#discussion_r2330386653


##########
datafusion/physical-plan/src/windows/mod.rs:
##########
@@ -337,30 +342,151 @@ pub(crate) fn window_equivalence_properties(
     input: &Arc<dyn ExecutionPlan>,
     window_exprs: &[Arc<dyn WindowExpr>],
 ) -> EquivalenceProperties {
-    // We need to update the schema, so we can not directly use
-    // `input.equivalence_properties()`.
+    // We need to update the schema, so we can't directly use input's 
equivalence
+    // properties.
     let mut window_eq_properties = 
EquivalenceProperties::new(Arc::clone(schema))
         .extend(input.equivalence_properties().clone());
 
-    let schema_len = schema.fields.len();
-    let window_expr_indices =
-        ((schema_len - window_exprs.len())..schema_len).collect::<Vec<_>>();
+    let window_schema_len = schema.fields.len();
+    let input_schema_len = window_schema_len - window_exprs.len();
+    let window_expr_indices = 
(input_schema_len..window_schema_len).collect::<Vec<_>>();
+
     for (i, expr) in window_exprs.iter().enumerate() {
-        if let Some(udf_window_expr) = 
expr.as_any().downcast_ref::<StandardWindowExpr>()
+        let partitioning_exprs = expr.partition_by();
+        let no_partitioning = partitioning_exprs.is_empty();
+        // Collect columns defining partitioning, and construct all 
`SortOptions`
+        // variations for them. Then, we will check each one whether it 
satisfies
+        // the existing ordering provided by the input plan.
+        let partition_by_orders = partitioning_exprs
+            .iter()
+            .map(|pb_order| 
sort_options_resolving_constant(Arc::clone(pb_order)));
+        let all_satisfied_lexs = partition_by_orders
+            .multi_cartesian_product()

Review Comment:
   > I understand the desire, but exponential planning time is hardly 
acceptable in our use-case. For a real production query, DF 45 works in a snap, 
and DF 46+ never exists the planner. I had to chop off a bunch of columns from 
a window to get it to completion.
   
   The chance of skipping this complex part can be detected earlier before (for 
example, if there is no order requirement coming from downstream), and there 
wouldn't be any order calculation logic specific to window expressions. 
   
   > > we need to compare the existing ordering against every possible ordering
   > 
   > the `sort_options_resolving_constant` returns only 2 options out of 4 
possible. is this correctness problem, or a missed optimization 'problem'?
   > 
   > define "need"
   
   I checked the code, and I believe one of the three usages of 
`sort_options_resolving_constant` should be updated to generate all 4 
possibilities (where it is used over partitioning expressions, not 
window/aggregate functions). The reason for generating only 2 of them is that 
set monotonicity is broken if the data has an increasing order but nulls come 
first, and vice versa, if the data has a decreasing order but nulls come last. 
So, it's not a correctness problem but a missed optimization



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to