2010YOUY01 opened a new pull request, #90:
URL: https://github.com/apache/sedona-db/pull/90
Hi šš¼ , Iām new to the project and still learning my way around. `sedona-db`
looks great, and Iād really appreciate any feedbacks.
## Rationale
Before, the execution logic for `st_geometrytype()` function is, for each
row, first parse the `WKB` binary into a `WKB` object, then extract the base
type from the object. This approach includes parsing unused fields in the `WKB`
binary, since only the geometry type is needed.
This PR let it iterate through the raw `WKB` bytes, and directly parse the
bytes to get the geometry type.
## Implementation
1. Extend `GenericExecutor` with a new API `execute_wkb_bytes_void()` to
iterate on raw `WKB` bytes.
2. Implement a util to parse the type from `WKB` binary according to the
spec.
3. Update `st_geometrytype()` with 1 and 2
I think it's better to move `2` to `wkb` crate, it doesn't have such a
public interface yet š¤
## Benchmark
### Command
```
pytest --benchmark-group-by=param:table
--benchmark-columns=median,mean,stddev
test_functions.py::TestBenchFunctions::test_st_geometrytype
```
### Result:
5x faster for complex collections, 30% faster for simple collections:
```sh
-------------------------------- benchmark 'table=collections_complex': 3
tests -------------------------------
Name (time in ms) Median
Mean StdDev
---------------------------------------------------------------------------------------------------------------
test_st_geometrytype[collections_complex-SedonaDB] 2.3656 (1.0)
2.4929 (1.0) 0.3857 (1.0)
test_st_geometrytype[collections_complex-DuckDB] 34.2037 (14.46)
34.3980 (13.80) 0.8402 (2.18)
test_st_geometrytype[collections_complex-PostGIS] 304.6275 (128.77)
306.7333 (123.04) 5.8908 (15.27)
---------------------------------------------------------------------------------------------------------------
------------------------------ benchmark 'table=collections_simple': 3 tests
-------------------------------
Name (time in ms) Median
Mean StdDev
------------------------------------------------------------------------------------------------------------
test_st_geometrytype[collections_simple-SedonaDB] 1.3585 (1.0)
1.7419 (1.0) 1.2142 (9.41)
test_st_geometrytype[collections_simple-DuckDB] 5.1103 (3.76)
5.1443 (2.95) 0.1291 (1.0)
test_st_geometrytype[collections_simple-PostGIS] 46.8870 (34.51)
46.9021 (26.93) 0.3712 (2.88)
------------------------------------------------------------------------------------------------------------
```
```sh
-------------------------------------- benchmark
'table=collections_complex': 3 tests -------------------------------------
Name (time in us) Median
Mean StdDev
---------------------------------------------------------------------------------------------------------------------------
test_st_geometrytype[collections_complex-SedonaDB] 419.2500 (1.0)
450.9272 (1.0) 124.1193 (1.0)
test_st_geometrytype[collections_complex-DuckDB] 32,422.7921 (77.34)
32,917.7395 (73.00) 2,088.4215 (16.83)
test_st_geometrytype[collections_complex-PostGIS] 295,752.0001 (705.43)
294,866.8750 (653.91) 3,872.8562 (31.20)
---------------------------------------------------------------------------------------------------------------------------
------------------------------------ benchmark 'table=collections_simple': 3
tests -------------------------------------
Name (time in us) Median
Mean StdDev
------------------------------------------------------------------------------------------------------------------------
test_st_geometrytype[collections_simple-SedonaDB] 613.2090 (1.0)
1,144.3652 (1.0) 1,073.4389 (3.42)
test_st_geometrytype[collections_simple-DuckDB] 5,502.5411 (8.97)
5,556.3829 (4.86) 314.2311 (1.0)
test_st_geometrytype[collections_simple-PostGIS] 36,191.1250 (59.02)
36,322.7638 (31.74) 730.0613 (2.32)
------------------------------------------------------------------------------------------------------------------------
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]