paleolimbot commented on code in PR #446:
URL: https://github.com/apache/sedona-db/pull/446#discussion_r2615770610
##########
rust/sedona-schema/src/crs.rs:
##########
@@ -15,41 +15,61 @@
// specific language governing permissions and limitations
// under the License.
use datafusion_common::{DataFusionError, Result};
+use lru::LruCache;
use std::fmt::{Debug, Display};
+use std::num::NonZeroUsize;
use std::str::FromStr;
use std::sync::Arc;
+use std::sync::Mutex;
+use std::sync::OnceLock;
use serde_json::Value;
+/// LRU cache for CRS deserialization
+static CRS_CACHE: OnceLock<Mutex<LruCache<String, Crs>>> = OnceLock::new();
+
+fn get_crs_cache() -> &'static Mutex<LruCache<String, Crs>> {
+ CRS_CACHE.get_or_init(|| {
+ // Cache up to 256 CRS strings
+ Mutex::new(LruCache::new(NonZeroUsize::new(256).unwrap()))
+ })
+}
Review Comment:
Can/should this be thread local to avoid contention when CRS computations
are happening on many partitions at once?
##########
rust/sedona-schema/src/datatypes.rs:
##########
@@ -312,7 +312,13 @@ fn deserialize_edges_and_crs(value: &Option<String>) ->
Result<(Edges, Crs)> {
};
let crs = match json_value.get("crs") {
- Some(crs_value) => deserialize_crs(crs_value)?,
+ Some(crs_value) => {
+ if let Some(s) = crs_value.as_str() {
+ deserialize_crs(s)?
+ } else {
+ deserialize_crs(&crs_value.to_string())?
Review Comment:
The original reason for keeping the `Value` here was to avoid the
re-serialization. We don't benchmark query planning but this code path gets
called every time we take a `FieldRef` from DataFusion and convert it to a
`SedonaType` (at least once per batch when calling our scalar functions).
Maybe:
```rust
match &crs_value {
Value::Object(obj) => // convert using the old path,
Value::String(s) => deserialize_crs(s)?,
Value::Null => None
}
```
...would improve the current state of this PR (which is FieldRef metadata ->
serde_json Value -> string -> serde_json Value for PROJJSON crses).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]