Omega359 opened a new issue, #17261:
URL: https://github.com/apache/datafusion/issues/17261

   ### Describe the bug
   
   This is a followup to #13748 which was largely addressed by #16217 however 
I've noticed in my testing that with large enough # of columns planning time is 
still growing exponentially. Below is the output from a modified version of the 
[sql_planner](https://gist.github.com/Omega359/0ad26d183ba7689b6c21b48182d61864)
 using a branch from DF 49:
   
   ```
   physical_sorted_union_order_by_10
                           time:   [12.089 ms 12.105 ms 12.125 ms]
                           change: [+6.4475% +6.6567% +6.8895%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   physical_sorted_union_order_by_50
                           time:   [323.92 ms 324.75 ms 325.53 ms]
                           change: [+6.5657% +6.8547% +7.1319%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   
   Benchmarking physical_sorted_union_order_by_100: Warming up for 30.000 s
   Warning: Unable to complete 10 samples in 5.0s. You may wish to increase 
target time to 18.5s.
   physical_sorted_union_order_by_100
                           time:   [1.7079 s 1.7143 s 1.7221 s]
                           change: [+6.5417% +6.9412% +7.5811%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 1 outliers among 10 measurements (10.00%)
     1 (10.00%) high mild
   
   Benchmarking physical_sorted_union_order_by_200: Warming up for 30.000 s
   Warning: Unable to complete 10 samples in 5.0s. You may wish to increase 
target time to 118.6s.
   physical_sorted_union_order_by_200
                           time:   [10.271 s 10.321 s 10.400 s]
                           change: [+4.3930% +4.9196% +5.7641%] (p = 0.00 < 
0.05)
                           Performance has regressed.
   Found 3 outliers among 10 measurements (30.00%)
     2 (20.00%) low mild
     1 (10.00%) high severe
   
   Benchmarking physical_sorted_union_order_by_500: Warming up for 30.000 s
   Warning: Unable to complete 10 samples in 5.0s. You may wish to increase 
target time to 1350.0s.
   physical_sorted_union_order_by_500
                           time:   [129.75 s 131.00 s 132.82 s]
   ```
   Ignored the regressed messages - tests are +-10% on my machine between runs. 
Note the time for the 500 columns.
   
   Below is the output from one of the benchmark runs with 500 columns where I 
instrumented the logical plan optimize and physical plan optimizer to output 
timings:
   
   ```
   Benchmarking physical_sorted_union_order_by_500: Collecting 10 samples in 
estimated 1202.6 s (10 iterations)[2025-08-20T14:03:18Z INFO  
datafusion_optimizer::analyzer] Analyzer::execute_and_check rule 
resolve_grouping_function took 46 ms
   [2025-08-20T14:03:20Z INFO  datafusion_optimizer::analyzer] 
Analyzer::execute_and_check rule type_coercion took 2007 ms
   [2025-08-20T14:03:20Z INFO  datafusion_optimizer::analyzer] 
Analyzer::execute_and_check rule processing took 2054 ms
   [2025-08-20T14:03:20Z INFO  datafusion_optimizer::analyzer] 
Analyzer::execute_and_check took 2129 ms
   [2025-08-20T14:03:21Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule eliminate_nested_union took > 50ms: 1544 ms
   [2025-08-20T14:03:22Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule simplify_expressions took > 50ms: 823 ms
   [2025-08-20T14:03:22Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule replace_distinct_aggregate took > 50ms: 66 ms
   [2025-08-20T14:03:22Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule eliminate_join took > 50ms: 67 ms
   [2025-08-20T14:03:23Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule decorrelate_predicate_subquery took > 50ms: 130 ms
   [2025-08-20T14:03:23Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule scalar_subquery_to_join took > 50ms: 86 ms
   [2025-08-20T14:03:23Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule decorrelate_lateral_join took > 50ms: 67 ms
   [2025-08-20T14:03:23Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule extract_equijoin_predicate took > 50ms: 67 ms
   [2025-08-20T14:03:23Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule eliminate_duplicated_expr took > 50ms: 68 ms
   [2025-08-20T14:03:23Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule eliminate_filter took > 50ms: 67 ms
   [2025-08-20T14:03:23Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule eliminate_limit took > 50ms: 66 ms
   [2025-08-20T14:03:23Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule propagate_empty_relation took > 50ms: 67 ms
   [2025-08-20T14:03:23Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule eliminate_one_union took > 50ms: 68 ms
   [2025-08-20T14:03:23Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule filter_null_join_keys took > 50ms: 67 ms
   [2025-08-20T14:03:23Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule eliminate_outer_join took > 50ms: 68 ms
   [2025-08-20T14:03:23Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule push_down_limit took > 50ms: 67 ms
   [2025-08-20T14:03:23Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule push_down_filter took > 50ms: 67 ms
   [2025-08-20T14:03:23Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule single_distinct_aggregation_to_group_by took > 50ms: 67 ms
   [2025-08-20T14:03:24Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule eliminate_group_by_constant took > 50ms: 68 ms
   [2025-08-20T14:03:24Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule common_sub_expression_eliminate took > 50ms: 138 ms
   [2025-08-20T14:03:25Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule optimize_projections took > 50ms: 1721 ms
   [2025-08-20T14:03:26Z INFO  datafusion_optimizer::optimizer] Processing time 
for pass 0 was 5561ms
   [2025-08-20T14:03:27Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule eliminate_nested_union took > 50ms: 1648 ms
   [2025-08-20T14:03:28Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule simplify_expressions took > 50ms: 804 ms
   [2025-08-20T14:03:28Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule replace_distinct_aggregate took > 50ms: 72 ms
   [2025-08-20T14:03:28Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule eliminate_join took > 50ms: 75 ms
   [2025-08-20T14:03:28Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule decorrelate_predicate_subquery took > 50ms: 135 ms
   [2025-08-20T14:03:28Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule scalar_subquery_to_join took > 50ms: 91 ms
   [2025-08-20T14:03:28Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule decorrelate_lateral_join took > 50ms: 76 ms
   [2025-08-20T14:03:28Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule extract_equijoin_predicate took > 50ms: 72 ms
   [2025-08-20T14:03:29Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule eliminate_duplicated_expr took > 50ms: 75 ms
   [2025-08-20T14:03:29Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule eliminate_filter took > 50ms: 72 ms
   [2025-08-20T14:03:29Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule eliminate_limit took > 50ms: 74 ms
   [2025-08-20T14:03:29Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule propagate_empty_relation took > 50ms: 72 ms
   [2025-08-20T14:03:29Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule eliminate_one_union took > 50ms: 74 ms
   [2025-08-20T14:03:29Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule filter_null_join_keys took > 50ms: 72 ms
   [2025-08-20T14:03:29Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule eliminate_outer_join took > 50ms: 74 ms
   [2025-08-20T14:03:29Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule push_down_limit took > 50ms: 71 ms
   [2025-08-20T14:03:29Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule push_down_filter took > 50ms: 75 ms
   [2025-08-20T14:03:29Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule single_distinct_aggregation_to_group_by took > 50ms: 73 ms
   [2025-08-20T14:03:29Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule eliminate_group_by_constant took > 50ms: 75 ms
   [2025-08-20T14:03:29Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule common_sub_expression_eliminate took > 50ms: 175 ms
   [2025-08-20T14:03:31Z INFO  datafusion_optimizer::optimizer] Optimization 
for rule optimize_projections took > 50ms: 1745 ms
   [2025-08-20T14:03:31Z INFO  datafusion_optimizer::optimizer] optimizer pass 
1 did not make changes. Processing time: 5826 ms
   [2025-08-20T14:03:31Z INFO  datafusion_optimizer::optimizer] Processing time 
for 3 passes was 11473ms
   [2025-08-20T14:03:31Z INFO  datafusion_optimizer::optimizer] Optimizer took 
11559 ms
   [2025-08-20T14:03:47Z INFO  datafusion::physical_planner] Creating initial 
physical plan took 15814 ms
   [2025-08-20T14:03:47Z INFO  datafusion::physical_planner] Physical plan 
optimizer OutputRequirements took 0 ms
   [2025-08-20T14:03:47Z INFO  datafusion::physical_planner] Physical plan 
optimizer aggregate_statistics took 0 ms
   [2025-08-20T14:03:47Z INFO  datafusion::physical_planner] Physical plan 
optimizer join_selection took 0 ms
   [2025-08-20T14:03:47Z INFO  datafusion::physical_planner] Physical plan 
optimizer LimitedDistinctAggregation took 0 ms
   [2025-08-20T14:03:47Z INFO  datafusion::physical_planner] Physical plan 
optimizer FilterPushdown took 18 ms
   [2025-08-20T14:04:52Z INFO  datafusion::physical_planner] Physical plan 
optimizer EnforceDistribution took 64265 ms
   [2025-08-20T14:04:52Z INFO  datafusion::physical_planner] Physical plan 
optimizer CombinePartialFinalAggregate took 0 ms
   [2025-08-20T14:05:02Z INFO  datafusion::physical_planner] Physical plan 
optimizer EnforceSorting took 10527 ms
   [2025-08-20T14:05:02Z INFO  datafusion::physical_planner] Physical plan 
optimizer OptimizeAggregateOrder took 0 ms
   [2025-08-20T14:05:04Z INFO  datafusion::physical_planner] Physical plan 
optimizer ProjectionPushdown took 2373 ms
   [2025-08-20T14:05:04Z INFO  datafusion::physical_planner] Physical plan 
optimizer coalesce_batches took 3 ms
   [2025-08-20T14:05:04Z INFO  datafusion::physical_planner] Physical plan 
optimizer coalesce_async_exec_input took 0 ms
   [2025-08-20T14:05:04Z INFO  datafusion::physical_planner] Physical plan 
optimizer OutputRequirements took 0 ms
   [2025-08-20T14:05:04Z INFO  datafusion::physical_planner] Physical plan 
optimizer LimitAggregation took 0 ms
   [2025-08-20T14:05:07Z INFO  datafusion::physical_planner] Physical plan 
optimizer LimitPushdown took 2122 ms
   [2025-08-20T14:05:07Z INFO  datafusion::physical_planner] Physical plan 
optimizer ProjectionPushdown took 5 ms
   [2025-08-20T14:05:07Z INFO  datafusion::physical_planner] Physical plan 
optimizer EnsureCooperative took 0 ms
   [2025-08-20T14:05:07Z INFO  datafusion::physical_planner] Physical plan 
optimizer FilterPushdown(Post) took 79 ms
   [2025-08-20T14:05:07Z INFO  datafusion::physical_planner] Physical plan 
optimizer SanityCheckPlan took 63 ms
   [2025-08-20T14:05:07Z INFO  datafusion::physical_planner] Optimize physical 
plan took 79736 ms
   ```
   Note the times for the logical plan optimizer `eliminate_nested_union ` and 
the physical plan optimizers  `EnforceDistribution`, `EnforceSorting`, 
`ProjectionPushdown` and `LimitPushdown`.
   
   ### To Reproduce
   
   _No response_
   
   ### Expected behavior
   
   Query planning should be much faster.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to