acking-you commented on PR #16647:
URL: https://github.com/apache/datafusion/pull/16647#issuecomment-3027160766
This is the benchmark scenario where the test data has not been modified by
default(multi large string):
```sh
Benchmarking
bench_merge_sorted_preserving/multiple_large_string_columns_with_1m_rows:
Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase
target time to 50.2s.
bench_merge_sorted_preserving/multiple_large_string_columns_with_1m_rows
time: [5.0435 s 5.0615 s 5.0813 s]
Found 3 outliers among 10 measurements (30.00%)
1 (10.00%) low mild
2 (20.00%) high severe
Benchmarking
bench_merge_sorted_preserving/multiple_u64_columns_with_1m_rows: Warming up for
3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase
target time to 8.6s or enable flat sampling.
bench_merge_sorted_preserving/multiple_u64_columns_with_1m_rows
time: [157.82 ms 160.78 ms 163.05 ms]
➜ arrow-datafusion git:(main) git checkout reuse_rows
root@VM-250-221-tencentos arrow-datafusion #
branch 'reuse_rows' set up to track 'origin/reuse_rows'.
Switched to a new branch 'reuse_rows'
➜ arrow-datafusion git:(reuse_rows) cargo bench --bench
sort_preserving_merge -- --sample-size=10
Benchmarking
bench_merge_sorted_preserving/multiple_large_string_columns_with_1m_rows:
Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase
target time to 51.2s.
bench_merge_sorted_preserving/multiple_large_string_columns_with_1m_rows
time: [5.0404 s 5.0613 s 5.0831 s]
change: [-0.5635% -0.0039% +0.5493%] (p = 0.99 >
0.05)
No change in performance detected.
Benchmarking
bench_merge_sorted_preserving/multiple_u64_columns_with_1m_rows: Warming up for
3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase
target time to 8.6s or enable flat sampling.
bench_merge_sorted_preserving/multiple_u64_columns_with_1m_rows
time: [155.99 ms 157.30 ms 159.18 ms]
change: [-3.1635% -1.4444% +0.3068%] (p = 0.15 >
0.05)
No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
```
The performance improvement in the test data above appears to be minimal. I
suspect this might be due to the length of the string used for testing being
too large, making the memory allocation overhead negligible in comparison.
So I tried to make the string smaller, and the test results are as follows:
```sh
bench_merge_sorted_preserving/multiple_large_string_columns_with_1m_rows
time: [757.06 ms 760.87 ms 764.68 ms]
bench_merge_sorted_preserving/multiple_u64_columns_with_1m_rows
time: [209.89 ms 210.70 ms 211.52 ms]
➜ arrow-datafusion git:(main) git checkout reuse_rows
bench_merge_sorted_preserving/multiple_large_string_columns_with_1m_rows
time: [755.94 ms 758.84 ms 762.58 ms]
change: [-0.9202% -0.2676% +0.4455%] (p = 0.47 >
0.05)
No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
bench_merge_sorted_preserving/multiple_u64_columns_with_1m_rows
time: [209.22 ms 210.43 ms 212.07 ms]
change: [-0.8397% -0.1278% +0.7042%] (p = 0.78 >
0.05)
No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high severe
```
The performance improvement compared to before is indeed more noticeable.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]