Kontinuation opened a new pull request, #523: URL: https://github.com/apache/sedona-db/pull/523
This PR addresses performance bottlenecks (stragglers) observed during the candidate refinement phase of SpatialBench Q10 and Q11, particularly at higher scale factors (SF=100 and SF=1000). When executing queries with large windows on dense datasets, a single R-Tree index query can retrieve millions of candidates. The probe partition becomes a "straggler" because it must sequentially evaluate spatial predicates for these millions of geometries. Since this bottleneck occurs within a single partition, DataFusion’s partition-level parallelism is unable to distribute the load. This patch introduced an async batch query interface for SpatialIndex. This allows the engine to split massive refinement workloads into smaller tasks, which are then executed in parallel by an async runtime. This amortizes scheduling costs of async function calls and eliminates the single-partition bottleneck. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
