Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Jia Yu
Hi Reynold and team, I’m glad to see that the Spark community is recognizing the importance of geospatial support. The Sedona community has long been a strong advocate for Spark, and we’ve proudly supported large-scale geospatial workloads on Spark for nearly a decade. We’re absolutely open to col

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Ángel Álvarez Pascua
* 1. Domain types evolve quickly.* It has taken years for Parquet to include these new types in its format... We could evolve alongside Parquet. Unfortunately, Spark is not known for upgrading its dependencies quickly. * 2. Geospatial in Java and Python is a dependency hell.* How has Par

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Reynold Xin
While I don’t think Spark should become a super specialized geospatial processing engine, I don’t think it makes sense to focus *only* on reading and writing from storage. Geospatial is a pretty common and fundamental capability of analytics systems and virtually every mature and popular analytics

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Jia Yu
Hi Wenchen, Menelaos and Szehon, Thanks for the clarification — I’m glad to hear the primary motivation of this SPIP is focused on reading and writing geospatial data with Parquet and Iceberg. That’s an important goal, and I want to highlight that this problem is being solved by the Apache Sedo

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Menelaos Karavelas
To continue along the line of thought of Szehon: I am really excited that the Parquet and Iceberg communities have adopted geospatial logical types and of course I am grateful for the work put in that direction. As both Wenchen and Szehon pointed out in their own way, the goal is to have minim

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Szehon Ho
Thank you Menelaos, will do!To give a little background, Jia and Sedona community, also GeoParquet community, and others really put much effort contributing to defining the Parquet and Iceberg geo types, which couldn't be done without their experience and help! But I do agree with Wenchen , now tha

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Menelaos Karavelas
Hello Jia, Wenchen summarized the intent very clearly. The scope of the proposal is primarily the type system and storage, not processing. Let’s work together on the technical details and make sure the work we propose to do in Spark works best with Apache Sedona. Best, Menelaos > On Mar 29,

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Wenchen Fan
Hi Jia, This is a good question. As the shepherd of this SPIP, I'd like to clarify the motivation here: the focus of this project is more about the storage part, not the processing. Apache Sedona is a great library for geo processing, but without native geo type support in Spark, users can't do th