Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-04-05 Thread Ángel Álvarez Pascua
Hi Jia, I really appreciate your very instructive answer. I truly believe that discussing topics with people who know far more than I do is a great way to learn new and interesting things. Your explanations are quite logical and make perfect sense to me. Sh**, I'm not that sure about this proposal

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-31 Thread Wenchen Fan
Hi Jia, Thanks for your detailed explanation! The existing implementation of geospatial serialization, predicate pushdown, and other features in Apache Sedona is indeed valuable for this project. What we’re proposing isn’t something entirely new to the industry but rather a re-architecture: we bel

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-30 Thread Szehon Ho
Hi Jia Yea I think the SPIP was trying to be concise and focused, but definitely was chatting yesterday with Menelaus on how to mention Sedona here but still keep it concise and focused :) It makes sense to collect feedbacks before the vote, thanks for the comments and collaboration here, and

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-30 Thread Jia Yu
Hey Angel, I am glad that you asked these questions. Please see my answers below. *1. Domain types evolve quickly. - It has taken years for Parquet to include these new types in its format... We could evolve alongside Parquet. Unfortunately, Spark is not known for upgrading its dependencies quic

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Jia Yu
Hi Reynold and team, I’m glad to see that the Spark community is recognizing the importance of geospatial support. The Sedona community has long been a strong advocate for Spark, and we’ve proudly supported large-scale geospatial workloads on Spark for nearly a decade. We’re absolutely open to col

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Ángel Álvarez Pascua
* 1. Domain types evolve quickly.* It has taken years for Parquet to include these new types in its format... We could evolve alongside Parquet. Unfortunately, Spark is not known for upgrading its dependencies quickly. * 2. Geospatial in Java and Python is a dependency hell.* How has Par

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Reynold Xin
While I don’t think Spark should become a super specialized geospatial processing engine, I don’t think it makes sense to focus *only* on reading and writing from storage. Geospatial is a pretty common and fundamental capability of analytics systems and virtually every mature and popular analytics

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Jia Yu
Hi Wenchen, Menelaos and Szehon, Thanks for the clarification — I’m glad to hear the primary motivation of this SPIP is focused on reading and writing geospatial data with Parquet and Iceberg. That’s an important goal, and I want to highlight that this problem is being solved by the Apache Sedo

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Menelaos Karavelas
To continue along the line of thought of Szehon: I am really excited that the Parquet and Iceberg communities have adopted geospatial logical types and of course I am grateful for the work put in that direction. As both Wenchen and Szehon pointed out in their own way, the goal is to have minim

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Szehon Ho
Thank you Menelaos, will do!To give a little background, Jia and Sedona community, also GeoParquet community, and others really put much effort contributing to defining the Parquet and Iceberg geo types, which couldn't be done without their experience and help! But I do agree with Wenchen , now tha

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Menelaos Karavelas
Hello Jia, Wenchen summarized the intent very clearly. The scope of the proposal is primarily the type system and storage, not processing. Let’s work together on the technical details and make sure the work we propose to do in Spark works best with Apache Sedona. Best, Menelaos > On Mar 29,

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Wenchen Fan
Hi Jia, This is a good question. As the shepherd of this SPIP, I'd like to clarify the motivation here: the focus of this project is more about the storage part, not the processing. Apache Sedona is a great library for geo processing, but without native geo type support in Spark, users can't do th

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-28 Thread Szehon Ho
Thanks Menelaos, this is exciting ! Is there a google doc we can comment, or just on the JIRA? Thanks Szehon On Fri, Mar 28, 2025 at 1:41 PM Ángel Álvarez Pascua < angel.alvarez.pas...@gmail.com> wrote: > Sorry, I only had a quick look at the proposal, looked for WKT and didn't > find anything.

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-28 Thread Ángel Álvarez Pascua
+1 (non-binding) El vie, 28 mar 2025, 18:48, Menelaos Karavelas escribió: > Dear Spark community, > > I would like to propose the addition of new geospatial data types > (GEOMETRY and GEOGRAPHY) which represent geospatial values as recently > added as new logical types in the Parquet specificati

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-28 Thread Jia Yu
Dear Menelaos, Thanks for bringing this up again. I’ve seen similar proposals come up on the mailing list before, and I’d like to offer some thoughts. For full transparency, I’m Jia Yu, PMC Chair of Apache Sedona (https://github.com/apache/sedona), a widely used open-source cluster computing f

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-28 Thread Ángel Álvarez Pascua
Sorry, I only had a quick look at the proposal, looked for WKT and didn't find anything. It's been years since I worked on geospatial projects and I'm not an expert (at all). Maybe starting with something simple but useful like conversion WKT<=>WKB? El vie, 28 mar 2025, 21:27, Menelaos Karavelas

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-28 Thread Menelaos Karavelas
In the SPIP Jira the proposal is to add the expressions ST_AsBinary, ST_GeomFromWKB, and ST_GeogFromWKB. Is there anything else that you think should be added? Regarding WKT, what do you think should be added? - Menelaos > On Mar 28, 2025, at 1:02 PM, Ángel Álvarez Pascua > wrote: > > What

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-28 Thread Ángel Álvarez Pascua
What about adding support for WKT / WKB ? El vie, 28 mar 2025 a las 20:50, Ángel Álvarez Pascua (< angel.alvarez.pas...@gmail.com>)