crepererum opened a new issue, #17372: URL: https://github.com/apache/datafusion/issues/17372
### Describe the bug The sanity checker for physical plans fails with `does not satisfy order requirements` under these very specific conditions (also see reproducer below): - you have node that requires input ordering (e.g. a SortPreservingMergeExec (SPM)) over a `UnionExec` - the `UnionExec` has at least 2 children - the children may have 1 column that is actually data to-be-sorted (let's call this column `a`), and two constants (called `const_1` and `const_2`) - the constant values for both are identical in the first child, but differ for the 2nd column for the second child (e.g. `[const_1=foo, const_2=foo], [const_1=foo, const_2=bar]`) - the children are sorted (by `SortExec`) but only for the col `a` (because for the constants the sorting is note required) The resulting error is (newlines added for clarity): ```text does not satisfy order requirements [const_1@0 ASC NULLS LAST, const_2@1 ASC NULLS LAST, a@2 ASC NULLS LAST]. Child-0 order: [[a@2 ASC NULLS LAST]] ``` ### To Reproduce You may add this to `datafusion/core/tests/physical_optimizer/enforce_sorting.rs` (or a better file, I just originally thought that this is related to the enforce-sorting pass but it's not) ```rust #[tokio::test] async fn test_kaputt() -> Result<()> { let schema_in = create_test_schema3().unwrap(); let proj_exprs_1 = vec![ ( Arc::new(Literal::new(ScalarValue::Utf8(Some("foo".to_owned())))) as _, "const_1".to_owned(), ), ( Arc::new(Literal::new(ScalarValue::Utf8(Some("foo".to_owned())))) as _, "const_2".to_owned(), ), (col("a", &schema_in).unwrap(), "a".to_owned()), ]; let proj_exprs_2 = vec![ ( Arc::new(Literal::new(ScalarValue::Utf8(Some("foo".to_owned())))) as _, "const_1".to_owned(), ), ( Arc::new(Literal::new(ScalarValue::Utf8(Some("bar".to_owned())))) as _, "const_2".to_owned(), ), (col("a", &schema_in).unwrap(), "a".to_owned()), ]; let source_1 = memory_exec(&schema_in); let source_1 = projection_exec(proj_exprs_1.clone(), source_1).unwrap(); let schema_sources = source_1.schema(); let ordering_sources: LexOrdering = [sort_expr("a", &schema_sources).nulls_last()].into(); let source_1 = sort_exec(ordering_sources.clone(), source_1); let source_2 = memory_exec(&schema_in); let source_2 = projection_exec(proj_exprs_2, source_2).unwrap(); let source_2 = sort_exec(ordering_sources.clone(), source_2); let plan = union_exec(vec![source_1, source_2]); let schema_out = plan.schema(); let ordering_out: LexOrdering = [ sort_expr("const_1", &schema_out).nulls_last(), sort_expr("const_2", &schema_out).nulls_last(), sort_expr("a", &schema_out).nulls_last(), ] .into(); let plan = sort_preserving_merge_exec(ordering_out, plan); println!("{}", get_plan_string(&plan).join("\n")); SanityCheckPlan::new() .optimize(plan.clone(), &Default::default()) .unwrap(); Ok(()) } ``` This produces this plan: ```text SortPreservingMergeExec: [const_1@0 ASC NULLS LAST, const_2@1 ASC NULLS LAST, a@2 ASC NULLS LAST] UnionExec SortExec: expr=[a@2 ASC NULLS LAST], preserve_partitioning=[false] ProjectionExec: expr=[foo as const_1, foo as const_2, a@0 as a] DataSourceExec: partitions=1, partition_sizes=[0] SortExec: expr=[a@2 ASC NULLS LAST], preserve_partitioning=[false] ProjectionExec: expr=[foo as const_1, bar as const_2, a@0 as a] DataSourceExec: partitions=1, partition_sizes=[0] ``` which will fail the sanity checker as described above. ### Expected behavior From what I can tell this plan is sound. ### Additional context Tested on da2f9e130a8754829558cd66809608a89c51316f . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org