jesspav opened a new pull request, #446:
URL: https://github.com/apache/sedona-db/pull/446
In flamegraphs for PR #430 revealed that the CRS serialization had some
opportunities for improvement that would be essential for raster/geo per item
CRS functions. This was going to be especially important for CRS comparisons
in joins.
Added a simple benchmarks for lnglat and auth code equality that was run
repeatedly with different performance optimizations. At the start it took 61ms
for auth codes and 37ms for lnglat. By the end auth codes have improved by
nearly 20x and lnglat by over 10x, with both running in 3.4ms.
There are basically two changes: an LRU cache and avoiding unnecessary
string allocations.
Breakdown of performance changes:
```
Starting point:
Running benches/crs_benchmarks.rs
(target/release/deps/crs_benchmarks-c62a15af9667b709)
equality_lnglat_crs time: [37.442 ms 37.712 ms 37.960 ms]
change: [+2.5665% +4.0155% +5.4781%] (p = 0.00 <
0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
6 (6.00%) low severe
9 (9.00%) low mild
Benchmarking equality_different_crs: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase
target time to 6.3s, or reduce sample count to 70.
equality_different_crs time: [61.892 ms 62.119 ms 62.362 ms]
change: [+2.1679% +2.6956% +3.2470%] (p = 0.00 <
0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
1 (1.00%) high mild
5 (5.00%) high severe
After cache:
equality_lnglat_crs time: [28.265 ms 28.361 ms 28.461 ms]
change: [−25.338% −24.795% −24.189%] (p = 0.00 <
0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
equality_different_crs time: [27.473 ms 27.530 ms 27.586 ms]
change: [−55.878% −55.683% −55.494%] (p = 0.00 <
0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
Time after cache + switch value to str + + colon rather than split
equality_lnglat_crs time: [11.324 ms 11.354 ms 11.385 ms]
change: [−60.148% −59.967% −59.798%] (p = 0.00 <
0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild
equality_different_crs time: [11.655 ms 11.701 ms 11.746 ms]
change: [−57.674% −57.497% −57.308%] (p = 0.00 <
0.05)
Performance has improved.
Time after cache + switch value to str + colon rather than split + pre-alloc
instead of format for eq:
Benchmarking equality_lnglat_crs: Collecting 100 samples in estimated 5.4946
s (800 iterat
equality_lnglat_crs time: [6.6983 ms 6.7258 ms 6.7557 ms]
change: [+0.7120% +1.2645% +1.8552%] (p = 0.00 <
0.05)
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) high mild
4 (4.00%) high severe
Benchmarking equality_different_crs: Collecting 100 samples in estimated
5.4518 s (800 ite
equality_different_crs time: [6.7116 ms 6.7429 ms 6.7758 ms]
change: [+0.7410% +1.5266% +2.2864%] (p = 0.00 <
0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
All of the above + not splitting up the auth code:
equality_lnglat_crs time: [3.3993 ms 3.4098 ms 3.4200 ms]
change: [−49.633% −49.425% −49.229%] (p = 0.00 <
0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) low mild
equality_different_crs time: [3.4080 ms 3.4170 ms 3.4258 ms]
change: [−49.183% −48.996% −48.819%] (p = 0.00 <
0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]