paleolimbot opened a new issue, #201:
URL: https://github.com/apache/sedona-db/issues/201
The `WkbHeader` will be a great optimization for a number of things we'd
like to do...another option worth exploring might be
Another technique potentially worth exploring to optimize these very cheap
functions is decomposing the parsing into a series of branchless loops. The
compiler has more options to optimize a loop that contains absolutely zero `if`
`match` or `||/&&`s (e.g. SIMD autovectorization). I've prototyped this in C++
before and found order-of-magnitude speed improvements for Binary arrays
(probably less helpful for BinaryView arrays, where there is always an if
statement to access the data).
```rust
let mut n_valid_size = 0;
for item in not_null_array {
n_valid_size += item.len() >= 5;
}
if n_valid_size != not_null_array.len() {
return do_the_slow_version();
}
let mut n_little_endian = 0;
for item in not_null_array {
n_little_endian += item[0] == 0x01;
}
if n_little_endian != not_null_array.len() {
return do_the_slow_version();
}
let mut geometry_types: u32 = 0;
let mut geometry_type_bytes: [u8; 4];
for item in not_null_array {
geometry_type_bytes.copy_from_slice(&item[1..5]);
let geometry_type_id = u32::from_be_bytes(&geometry_type_bytes);
geometry_types |= 1_u32 << geometry_type_id & 0x07;
}
// Potentially do something faster if all we have are points
```
There comes a point where the multiple branchless loops become slower than a
single loop with branching...I haven't experimented with this enough to know
where that point is.
For functions that only operate on points, there is also a cool optimization
you can do for Binary (not BinaryView) arrays: because the data for arrays with
zero nulls are all lined up consecutively in memory (i.e., one WKB item after
another all in the same buffer), you can loop through X values like this (once
you've validated the input using the above):
```rust
let mut offset = first_x_offset;
let mut x_bytes: [u8; 8];
for i in 0..array.len() {
x_bytes.copy_from_slice(data_buffer[offset..(offset + 8)]);
let x = f64::from_le_bytes(x_bytes);
offset += 21;
}
```
My initial explorations of this are here:
https://github.com/geoarrow/geoarrow-c/blob/28eca0fea6f47c70113dc1719e7597e53bede461/dev/benchmarks/c/wkb_bounding_benchmark.cc#L408-L450
. Specifically for the bounding operation, the initial numbers suggested that
this was able to trigger simd for at least the bounding operation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]