acking-you commented on PR #16647: URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3027160766
This is the benchmark scenario where the test data has not been modified by default(multi large string): ```sh Benchmarking bench_merge_sorted_preserving/multiple_large_string_columns_with_1m_rows: Warming up for 3.0000 s Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 50.2s. bench_merge_sorted_preserving/multiple_large_string_columns_with_1m_rows time: [5.0435 s 5.0615 s 5.0813 s] Found 3 outliers among 10 measurements (30.00%) 1 (10.00%) low mild 2 (20.00%) high severe Benchmarking bench_merge_sorted_preserving/multiple_u64_columns_with_1m_rows: Warming up for 3.0000 s Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.6s or enable flat sampling. bench_merge_sorted_preserving/multiple_u64_columns_with_1m_rows time: [157.82 ms 160.78 ms 163.05 ms] ➜ arrow-datafusion git:(main) git checkout reuse_rows root@VM-250-221-tencentos arrow-datafusion # branch 'reuse_rows' set up to track 'origin/reuse_rows'. Switched to a new branch 'reuse_rows' ➜ arrow-datafusion git:(reuse_rows) cargo bench --bench sort_preserving_merge -- --sample-size=10 Benchmarking bench_merge_sorted_preserving/multiple_large_string_columns_with_1m_rows: Warming up for 3.0000 s Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 51.2s. bench_merge_sorted_preserving/multiple_large_string_columns_with_1m_rows time: [5.0404 s 5.0613 s 5.0831 s] change: [-0.5635% -0.0039% +0.5493%] (p = 0.99 > 0.05) No change in performance detected. Benchmarking bench_merge_sorted_preserving/multiple_u64_columns_with_1m_rows: Warming up for 3.0000 s Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 8.6s or enable flat sampling. bench_merge_sorted_preserving/multiple_u64_columns_with_1m_rows time: [155.99 ms 157.30 ms 159.18 ms] change: [-3.1635% -1.4444% +0.3068%] (p = 0.15 > 0.05) No change in performance detected. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high mild ``` The performance improvement in the test data above appears to be minimal. I suspect this might be due to the length of the string used for testing being too large, making the memory allocation overhead negligible in comparison. So I tried to make the string smaller, and the test results are as follows: ```sh bench_merge_sorted_preserving/multiple_large_string_columns_with_1m_rows time: [757.06 ms 760.87 ms 764.68 ms] bench_merge_sorted_preserving/multiple_u64_columns_with_1m_rows time: [209.89 ms 210.70 ms 211.52 ms] ➜ arrow-datafusion git:(main) git checkout reuse_rows bench_merge_sorted_preserving/multiple_large_string_columns_with_1m_rows time: [755.94 ms 758.84 ms 762.58 ms] change: [-0.9202% -0.2676% +0.4455%] (p = 0.47 > 0.05) No change in performance detected. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high mild bench_merge_sorted_preserving/multiple_u64_columns_with_1m_rows time: [209.22 ms 210.43 ms 212.07 ms] change: [-0.8397% -0.1278% +0.7042%] (p = 0.78 > 0.05) No change in performance detected. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high severe ``` The performance improvement compared to before is indeed more noticeable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org