Kontinuation opened a new pull request, #285:
URL: https://github.com/apache/sedona-db/pull/285

   The idea is pretty simple: we use a 32-bit bitset to represent geometry type 
and dimensions. This replaces usages of HashSet<GeometryTypeAndDimensions> in 
`GeoStatistics` and reduces the overhead of updating geo statistics in 
`AnalyzeAccumulator`. We care about the performance geo statistics analyzer 
because it is applied to all geometries on the indexed side when running a 
spatial join. `AnalyzeAccumulator` and ST_Analyze_Aggr can be useful in some 
other places as well, so we'd like them to have minimal performance overhead.
   
   One of the reasons why `AnalyzeAccumulator` is slow is that `GeoStatistics` 
has immutable interfaces. We cannot update `GeoStatistics` object directly, but 
can only obtain a new `GeoStatistics` object that incorporates the change. This 
introduces lots of clones when updating statistics for batches of geometry 
objects. The `HashSet<GeometryTypeAndDimensions>` inside `GeoStatistics` is the 
main performance killer of clone, as it involves memory allocations and 
deallocations. Switching to GeometryTypeAndDimensionsSet makes the clone more 
lightweight and does not change the original immutable interfaces.
   
   Here is the benchmark result of st_analyze_aggr after applying this patch:
   
   ```
   Gnuplot not found, using plotters backend
   native-st_analyze_aggr-Array(Point)
                           time:   [4.1267 ms 4.2026 ms 4.3423 ms]
                           change: [-87.458% -87.216% -86.808%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     6 (6.00%) high mild
     2 (2.00%) high severe
   
   native-st_analyze_aggr-Array(LineString(10))
                           time:   [5.6607 ms 5.6728 ms 5.6868 ms]
                           change: [-83.578% -83.529% -83.482%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high severe
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to