Omega359 opened a new issue, #17261: URL: https://github.com/apache/datafusion/issues/17261
### Describe the bug This is a followup to #13748 which was largely addressed by #16217 however I've noticed in my testing that with large enough # of columns planning time is still growing exponentially. Below is the output from a modified version of the [sql_planner](https://gist.github.com/Omega359/0ad26d183ba7689b6c21b48182d61864) using a branch from DF 49: ``` physical_sorted_union_order_by_10 time: [12.089 ms 12.105 ms 12.125 ms] change: [+6.4475% +6.6567% +6.8895%] (p = 0.00 < 0.05) Performance has regressed. physical_sorted_union_order_by_50 time: [323.92 ms 324.75 ms 325.53 ms] change: [+6.5657% +6.8547% +7.1319%] (p = 0.00 < 0.05) Performance has regressed. Benchmarking physical_sorted_union_order_by_100: Warming up for 30.000 s Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 18.5s. physical_sorted_union_order_by_100 time: [1.7079 s 1.7143 s 1.7221 s] change: [+6.5417% +6.9412% +7.5811%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high mild Benchmarking physical_sorted_union_order_by_200: Warming up for 30.000 s Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 118.6s. physical_sorted_union_order_by_200 time: [10.271 s 10.321 s 10.400 s] change: [+4.3930% +4.9196% +5.7641%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 10 measurements (30.00%) 2 (20.00%) low mild 1 (10.00%) high severe Benchmarking physical_sorted_union_order_by_500: Warming up for 30.000 s Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 1350.0s. physical_sorted_union_order_by_500 time: [129.75 s 131.00 s 132.82 s] ``` Ignored the regressed messages - tests are +-10% on my machine between runs. Note the time for the 500 columns. Below is the output from one of the benchmark runs with 500 columns where I instrumented the logical plan optimize and physical plan optimizer to output timings: ``` Benchmarking physical_sorted_union_order_by_500: Collecting 10 samples in estimated 1202.6 s (10 iterations)[2025-08-20T14:03:18Z INFO datafusion_optimizer::analyzer] Analyzer::execute_and_check rule resolve_grouping_function took 46 ms [2025-08-20T14:03:20Z INFO datafusion_optimizer::analyzer] Analyzer::execute_and_check rule type_coercion took 2007 ms [2025-08-20T14:03:20Z INFO datafusion_optimizer::analyzer] Analyzer::execute_and_check rule processing took 2054 ms [2025-08-20T14:03:20Z INFO datafusion_optimizer::analyzer] Analyzer::execute_and_check took 2129 ms [2025-08-20T14:03:21Z INFO datafusion_optimizer::optimizer] Optimization for rule eliminate_nested_union took > 50ms: 1544 ms [2025-08-20T14:03:22Z INFO datafusion_optimizer::optimizer] Optimization for rule simplify_expressions took > 50ms: 823 ms [2025-08-20T14:03:22Z INFO datafusion_optimizer::optimizer] Optimization for rule replace_distinct_aggregate took > 50ms: 66 ms [2025-08-20T14:03:22Z INFO datafusion_optimizer::optimizer] Optimization for rule eliminate_join took > 50ms: 67 ms [2025-08-20T14:03:23Z INFO datafusion_optimizer::optimizer] Optimization for rule decorrelate_predicate_subquery took > 50ms: 130 ms [2025-08-20T14:03:23Z INFO datafusion_optimizer::optimizer] Optimization for rule scalar_subquery_to_join took > 50ms: 86 ms [2025-08-20T14:03:23Z INFO datafusion_optimizer::optimizer] Optimization for rule decorrelate_lateral_join took > 50ms: 67 ms [2025-08-20T14:03:23Z INFO datafusion_optimizer::optimizer] Optimization for rule extract_equijoin_predicate took > 50ms: 67 ms [2025-08-20T14:03:23Z INFO datafusion_optimizer::optimizer] Optimization for rule eliminate_duplicated_expr took > 50ms: 68 ms [2025-08-20T14:03:23Z INFO datafusion_optimizer::optimizer] Optimization for rule eliminate_filter took > 50ms: 67 ms [2025-08-20T14:03:23Z INFO datafusion_optimizer::optimizer] Optimization for rule eliminate_limit took > 50ms: 66 ms [2025-08-20T14:03:23Z INFO datafusion_optimizer::optimizer] Optimization for rule propagate_empty_relation took > 50ms: 67 ms [2025-08-20T14:03:23Z INFO datafusion_optimizer::optimizer] Optimization for rule eliminate_one_union took > 50ms: 68 ms [2025-08-20T14:03:23Z INFO datafusion_optimizer::optimizer] Optimization for rule filter_null_join_keys took > 50ms: 67 ms [2025-08-20T14:03:23Z INFO datafusion_optimizer::optimizer] Optimization for rule eliminate_outer_join took > 50ms: 68 ms [2025-08-20T14:03:23Z INFO datafusion_optimizer::optimizer] Optimization for rule push_down_limit took > 50ms: 67 ms [2025-08-20T14:03:23Z INFO datafusion_optimizer::optimizer] Optimization for rule push_down_filter took > 50ms: 67 ms [2025-08-20T14:03:23Z INFO datafusion_optimizer::optimizer] Optimization for rule single_distinct_aggregation_to_group_by took > 50ms: 67 ms [2025-08-20T14:03:24Z INFO datafusion_optimizer::optimizer] Optimization for rule eliminate_group_by_constant took > 50ms: 68 ms [2025-08-20T14:03:24Z INFO datafusion_optimizer::optimizer] Optimization for rule common_sub_expression_eliminate took > 50ms: 138 ms [2025-08-20T14:03:25Z INFO datafusion_optimizer::optimizer] Optimization for rule optimize_projections took > 50ms: 1721 ms [2025-08-20T14:03:26Z INFO datafusion_optimizer::optimizer] Processing time for pass 0 was 5561ms [2025-08-20T14:03:27Z INFO datafusion_optimizer::optimizer] Optimization for rule eliminate_nested_union took > 50ms: 1648 ms [2025-08-20T14:03:28Z INFO datafusion_optimizer::optimizer] Optimization for rule simplify_expressions took > 50ms: 804 ms [2025-08-20T14:03:28Z INFO datafusion_optimizer::optimizer] Optimization for rule replace_distinct_aggregate took > 50ms: 72 ms [2025-08-20T14:03:28Z INFO datafusion_optimizer::optimizer] Optimization for rule eliminate_join took > 50ms: 75 ms [2025-08-20T14:03:28Z INFO datafusion_optimizer::optimizer] Optimization for rule decorrelate_predicate_subquery took > 50ms: 135 ms [2025-08-20T14:03:28Z INFO datafusion_optimizer::optimizer] Optimization for rule scalar_subquery_to_join took > 50ms: 91 ms [2025-08-20T14:03:28Z INFO datafusion_optimizer::optimizer] Optimization for rule decorrelate_lateral_join took > 50ms: 76 ms [2025-08-20T14:03:28Z INFO datafusion_optimizer::optimizer] Optimization for rule extract_equijoin_predicate took > 50ms: 72 ms [2025-08-20T14:03:29Z INFO datafusion_optimizer::optimizer] Optimization for rule eliminate_duplicated_expr took > 50ms: 75 ms [2025-08-20T14:03:29Z INFO datafusion_optimizer::optimizer] Optimization for rule eliminate_filter took > 50ms: 72 ms [2025-08-20T14:03:29Z INFO datafusion_optimizer::optimizer] Optimization for rule eliminate_limit took > 50ms: 74 ms [2025-08-20T14:03:29Z INFO datafusion_optimizer::optimizer] Optimization for rule propagate_empty_relation took > 50ms: 72 ms [2025-08-20T14:03:29Z INFO datafusion_optimizer::optimizer] Optimization for rule eliminate_one_union took > 50ms: 74 ms [2025-08-20T14:03:29Z INFO datafusion_optimizer::optimizer] Optimization for rule filter_null_join_keys took > 50ms: 72 ms [2025-08-20T14:03:29Z INFO datafusion_optimizer::optimizer] Optimization for rule eliminate_outer_join took > 50ms: 74 ms [2025-08-20T14:03:29Z INFO datafusion_optimizer::optimizer] Optimization for rule push_down_limit took > 50ms: 71 ms [2025-08-20T14:03:29Z INFO datafusion_optimizer::optimizer] Optimization for rule push_down_filter took > 50ms: 75 ms [2025-08-20T14:03:29Z INFO datafusion_optimizer::optimizer] Optimization for rule single_distinct_aggregation_to_group_by took > 50ms: 73 ms [2025-08-20T14:03:29Z INFO datafusion_optimizer::optimizer] Optimization for rule eliminate_group_by_constant took > 50ms: 75 ms [2025-08-20T14:03:29Z INFO datafusion_optimizer::optimizer] Optimization for rule common_sub_expression_eliminate took > 50ms: 175 ms [2025-08-20T14:03:31Z INFO datafusion_optimizer::optimizer] Optimization for rule optimize_projections took > 50ms: 1745 ms [2025-08-20T14:03:31Z INFO datafusion_optimizer::optimizer] optimizer pass 1 did not make changes. Processing time: 5826 ms [2025-08-20T14:03:31Z INFO datafusion_optimizer::optimizer] Processing time for 3 passes was 11473ms [2025-08-20T14:03:31Z INFO datafusion_optimizer::optimizer] Optimizer took 11559 ms [2025-08-20T14:03:47Z INFO datafusion::physical_planner] Creating initial physical plan took 15814 ms [2025-08-20T14:03:47Z INFO datafusion::physical_planner] Physical plan optimizer OutputRequirements took 0 ms [2025-08-20T14:03:47Z INFO datafusion::physical_planner] Physical plan optimizer aggregate_statistics took 0 ms [2025-08-20T14:03:47Z INFO datafusion::physical_planner] Physical plan optimizer join_selection took 0 ms [2025-08-20T14:03:47Z INFO datafusion::physical_planner] Physical plan optimizer LimitedDistinctAggregation took 0 ms [2025-08-20T14:03:47Z INFO datafusion::physical_planner] Physical plan optimizer FilterPushdown took 18 ms [2025-08-20T14:04:52Z INFO datafusion::physical_planner] Physical plan optimizer EnforceDistribution took 64265 ms [2025-08-20T14:04:52Z INFO datafusion::physical_planner] Physical plan optimizer CombinePartialFinalAggregate took 0 ms [2025-08-20T14:05:02Z INFO datafusion::physical_planner] Physical plan optimizer EnforceSorting took 10527 ms [2025-08-20T14:05:02Z INFO datafusion::physical_planner] Physical plan optimizer OptimizeAggregateOrder took 0 ms [2025-08-20T14:05:04Z INFO datafusion::physical_planner] Physical plan optimizer ProjectionPushdown took 2373 ms [2025-08-20T14:05:04Z INFO datafusion::physical_planner] Physical plan optimizer coalesce_batches took 3 ms [2025-08-20T14:05:04Z INFO datafusion::physical_planner] Physical plan optimizer coalesce_async_exec_input took 0 ms [2025-08-20T14:05:04Z INFO datafusion::physical_planner] Physical plan optimizer OutputRequirements took 0 ms [2025-08-20T14:05:04Z INFO datafusion::physical_planner] Physical plan optimizer LimitAggregation took 0 ms [2025-08-20T14:05:07Z INFO datafusion::physical_planner] Physical plan optimizer LimitPushdown took 2122 ms [2025-08-20T14:05:07Z INFO datafusion::physical_planner] Physical plan optimizer ProjectionPushdown took 5 ms [2025-08-20T14:05:07Z INFO datafusion::physical_planner] Physical plan optimizer EnsureCooperative took 0 ms [2025-08-20T14:05:07Z INFO datafusion::physical_planner] Physical plan optimizer FilterPushdown(Post) took 79 ms [2025-08-20T14:05:07Z INFO datafusion::physical_planner] Physical plan optimizer SanityCheckPlan took 63 ms [2025-08-20T14:05:07Z INFO datafusion::physical_planner] Optimize physical plan took 79736 ms ``` Note the times for the logical plan optimizer `eliminate_nested_union ` and the physical plan optimizers `EnforceDistribution`, `EnforceSorting`, `ProjectionPushdown` and `LimitPushdown`. ### To Reproduce _No response_ ### Expected behavior Query planning should be much faster. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
