gokselk commented on code in PR #13805: URL: https://github.com/apache/datafusion/pull/13805#discussion_r1889249673
########## datafusion/physical-expr/src/equivalence/properties.rs: ########## @@ -3651,4 +3661,38 @@ mod tests { sort_expr } + + #[test] + fn test_union_constant_value_preservation() -> Result<()> { + let schema = Arc::new(Schema::new(vec![ + Field::new("a", DataType::Int32, true), + Field::new("b", DataType::Int32, true), + ])); + + let col_a = col("a", &schema)?; + let literal_10 = ScalarValue::Int32(Some(10)); + + // Create first input with a=10 + let const_expr1 = + ConstExpr::new(Arc::clone(&col_a)).with_value(literal_10.clone()); + let input1 = EquivalenceProperties::new(Arc::clone(&schema)) + .with_constants(vec![const_expr1]); + + // Create second input with a=10 + let const_expr2 = + ConstExpr::new(Arc::clone(&col_a)).with_value(literal_10.clone()); + let input2 = EquivalenceProperties::new(Arc::clone(&schema)) + .with_constants(vec![const_expr2]); + + // Calculate union properties + let union_props = calculate_union(vec![input1, input2], schema)?; + + // Verify column 'a' remains constant with value 10 + let const_a = &union_props.constants()[0]; + assert!(const_a.expr().eq(&col_a)); + assert!(const_a.across_partitions()); + assert_eq!(const_a.value(), Some(&literal_10)); + + Ok(()) + } } Review Comment: > Can you take a look at the first check @gokselk? It should take a few line changes in plan_with_order_preserving_variants() function. It should first look the order requirements, and if they are matched, then it would try to convert CoalescePartitionExec to SortPreservingMergeExec. But before that conversion, you can check across_partitions flag of the input constants, and if it is true, you can left the CoalescePartitionsExec as is. The issue is deeper than just checking `across_partitions`. Looking at `calculate_union()`, we're currently discarding all equivalence properties at UNION, including constants. We should preserve constants that are common across all UNION inputs. For example, in: ```sql (SELECT * FROM t WHERE x='a') UNION ALL (SELECT * FROM t WHERE x='a') ``` 'x' remains constant='a' across the entire UNION, but we're losing this information. We should modify `calculate_union()` to preserve such common constants. Would you like me to work on implementing this first? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org