findepi commented on code in PR #14813:
URL: https://github.com/apache/datafusion/pull/14813#discussion_r2334013940


##########
datafusion/physical-plan/src/windows/mod.rs:
##########
@@ -337,30 +342,151 @@ pub(crate) fn window_equivalence_properties(
     input: &Arc<dyn ExecutionPlan>,
     window_exprs: &[Arc<dyn WindowExpr>],
 ) -> EquivalenceProperties {
-    // We need to update the schema, so we can not directly use
-    // `input.equivalence_properties()`.
+    // We need to update the schema, so we can't directly use input's 
equivalence
+    // properties.
     let mut window_eq_properties = 
EquivalenceProperties::new(Arc::clone(schema))
         .extend(input.equivalence_properties().clone());
 
-    let schema_len = schema.fields.len();
-    let window_expr_indices =
-        ((schema_len - window_exprs.len())..schema_len).collect::<Vec<_>>();
+    let window_schema_len = schema.fields.len();
+    let input_schema_len = window_schema_len - window_exprs.len();
+    let window_expr_indices = 
(input_schema_len..window_schema_len).collect::<Vec<_>>();
+
     for (i, expr) in window_exprs.iter().enumerate() {
-        if let Some(udf_window_expr) = 
expr.as_any().downcast_ref::<StandardWindowExpr>()
+        let partitioning_exprs = expr.partition_by();
+        let no_partitioning = partitioning_exprs.is_empty();
+        // Collect columns defining partitioning, and construct all 
`SortOptions`
+        // variations for them. Then, we will check each one whether it 
satisfies
+        // the existing ordering provided by the input plan.
+        let partition_by_orders = partitioning_exprs
+            .iter()
+            .map(|pb_order| 
sort_options_resolving_constant(Arc::clone(pb_order)));
+        let all_satisfied_lexs = partition_by_orders
+            .multi_cartesian_product()

Review Comment:
   > The chance of skipping this complex part can be detected earlier before 
(for example, if there is no order requirement coming from downstream),
   
   That is simple.
   However, it's likely to help with simple queries only. I.e. it will help 
with test queries, but more complex production workloads will still end up 
doing exponential (multi-minutes) planning.
   
   We need an approach that's better than `O(n²)` (and obviously current 
`O(2ⁿ)` is much much worse).
   
   From query execution perspective, those _minutes_ spent in planning are 
minutes wasted, if query can be executed in _seconds_.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to