Re: [I] feat: expand the Python interface [sedona-db]

via GitHub Thu, 02 Oct 2025 12:10:09 -0700


paleolimbot commented on issue #159:
URL: https://github.com/apache/sedona-db/issues/159#issuecomment-3347771843


   I would absolutely love a DataFrame-style API...for somebody sitting down to 
do exploratory analysis SQL is fine, but for pipeline development and/or using 
SedonaDB as a dependency to do other stuff we definitely need a way to handle 
Pythonic inputs that aren't `f""` stringed SQL.
   
   We have some high-level options:
   
   - For Spark compatibility (i.e., people running Sedona wanting to migrate to 
SedonaDB), we can either duck type pyspark or expose a Spark Connect server. I 
would prefer a Spark Connect server written in Python...I think Sail also went 
the Spark connect route but maybe implements their server in Rust.
   - For general data frame usage (i.e. people coming from geopandas or R), our 
current approach has been to duck type https://ibis-project.org (as data frame 
APIs go it is definitely the friendliest/most recently invented and shares 
mostly compatible syntax with DuckDB's Python API).
   - We can also generate Spark connect protobuf from DataFusion logical plans 
in Rust (there is an official library!).
   
   The only thing I actually don't want to do is duck type pyspark (I 
personally have no interest in bringing more pyspark code into 
existence/writing it) or depend on pyspark for our Python interface (we don't 
control it and they are unlikely to help us if we want to make it handle 
spatial data better).
   
   Just one person's take though...I'd love to hear what people's high level 
goals are!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] feat: expand the Python interface [sedona-db]

Reply via email to