Re: [PR] [GH-2331] Geopandas: Document differences of sindex compared to gpd + sindex fixes [sedona]

via GitHub Fri, 05 Sep 2025 11:54:19 -0700


zhangfengcdt commented on PR #2332:
URL: https://github.com/apache/sedona/pull/2332#issuecomment-3259423557


   > Sorry, I learned about this issue a little more recently and had it on my 
backlog of things to fix. I didn't realize how deep this went until I looked 
into it now. I think the question we need to decide here what should 
`GeoSeries.sindex` build by default? We have two options
   > 
   > * build from a np.array: non-distributed but returns indices like the 
original behavior
   > * build from a GeoSeries: distributed / fast, returns geometries (breaking 
difference)
   > 
   > Currently, it does the latter, which is my preference. WDYT @zhangfengcdt?
   
   I think given we are building on Spark/Sedona, scalability and performance 
should be the top priority. We should stick with Option 2.  The performance 
gain of the distributed approach far outweigh the cost of a documented breaking 
change, especially when the new workflow is arguably simpler for the user.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [GH-2331] Geopandas: Document differences of sindex compared to gpd + sindex fixes [sedona]

Reply via email to