paleolimbot commented on issue #159: URL: https://github.com/apache/sedona-db/issues/159#issuecomment-3347771843
I would absolutely love a DataFrame-style API...for somebody sitting down to do exploratory analysis SQL is fine, but for pipeline development and/or using SedonaDB as a dependency to do other stuff we definitely need a way to handle Pythonic inputs that aren't `f""` stringed SQL. We have some high-level options: - For Spark compatibility (i.e., people running Sedona wanting to migrate to SedonaDB), we can either duck type pyspark or expose a Spark Connect server. I would prefer a Spark Connect server written in Python...I think Sail also went the Spark connect route but maybe implements their server in Rust. - For general data frame usage (i.e. people coming from geopandas or R), our current approach has been to duck type https://ibis-project.org (as data frame APIs go it is definitely the friendliest/most recently invented and shares mostly compatible syntax with DuckDB's Python API). - We can also generate Spark connect protobuf from DataFusion logical plans in Rust (there is an official library!). The only thing I actually don't want to do is duck type pyspark (I personally have no interest in bringing more pyspark code into existence/writing it) or depend on pyspark for our Python interface (we don't control it and they are unlikely to help us if we want to make it handle spatial data better). Just one person's take though...I'd love to hear what people's high level goals are! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
