All my concerns are addressed, I'm ready to vote. On Mon, Sep 30, 2024 at 1:21 PM Szehon Ho <szehon.apa...@gmail.com> wrote:
> Hi all, > > There have been several rounds of discussion on the PR: > https://github.com/apache/iceberg/pull/10981 and I think most of the main > points have been addressed. > > If anyone is interested, please take a look. If there are no other major > points, we plan to start a VOTE thread soon. > > I know Jia and team are also volunteering to work on the prototype > immediately afterwards. > > Thank you, > Szehon > > On Tue, Aug 20, 2024 at 1:57 PM Szehon Ho <szehon.apa...@gmail.com> wrote: > >> Hi all >> >> Please take a look at the proposed spec change to support Geo type for V3 >> in : https://github.com/apache/iceberg/pull/10981, and comment or >> otherwise let me know your thoughts. >> >> Just as an FYI it incorporated the feedback from our last meeting (with >> Snowflake and Wherobots engineers). >> >> Thanks, >> Szehon >> >> On Wed, Jun 26, 2024 at 7:29 PM Szehon Ho <szehon.apa...@gmail.com> >> wrote: >> >>> Hi >>> >>> It was great to meet in person with Snowflake engineers and we had a >>> good discussion on the paths forward. >>> >>> Meeting notes for Snowflake- Iceberg sync. >>> >>> - Iceberg proposed Geometry type defaults to (edges=planar , >>> crs=CRS84). >>> - Snowflake has two types Geography (spherical) and Geometry >>> (planar, with customizable CRS). The data layout/encoding is the same >>> for >>> both types. Let's see how we can support each in Iceberg type, >>> especially >>> wrt Iceberg partition/file pruning >>> - Geography type support >>> - Main concern is the need for a suitable partition transform for >>> partition-level filter, the candidate is Micahel Entin's proposal >>> >>> <https://docs.google.com/document/d/1tG13UpdNH3i0bVkjFLsE2kXEXCuw1XRpAC2L2qCUox0/edit> >>> . >>> - Secondary concern is file and RG-level filtering. Gang's Parquet >>> proposal <https://github.com/apache/parquet-format/pull/240/files> >>> allow >>> storage of S2 / H3 ID's in Parquet stats, and so we can also leverage >>> that >>> in Iceberg pruning code (Google and Uber libraries are compatible) >>> - Geometry type support >>> - Main concern is partition transform needs to understand CRS, >>> but this can be solved by having XZ2 transform created with >>> customizable >>> min/max lat/long range (its all it needs) >>> - Should (CRS, edges) be stored properties on Geography type in >>> Phase 1? >>> - Should be fine to store, with only allowing defaults in Phase 1. >>> - Concern 1: If edges is stored, there will be ask to store other >>> properties like (orientation, epoch). Solution is to punt these >>> follow-on >>> properties for later. >>> - Concern 2: if crs is stored, what format? PROJJSON vs SRID. >>> Solution is to leave it as a string >>> - Concern 3: if crs is stored as a string, Iceberg cannot read >>> it. This should be ok, as we only need this for XZ2 transform, where >>> the >>> user already passes in the info from CRS (up to user to make sure >>> these >>> align). >>> >>> Thanks >>> Szehon >>> >>> On Tue, Jun 18, 2024 at 12:23 PM Szehon Ho <szehon.apa...@gmail.com> >>> wrote: >>> >>>> Jia and I will sync with the Snowflake folks to see if we can have a >>>> solution, or roadmap to solution, in the proposal. >>>> >>>> Thanks JB for the interest! By the way, I want to schedule a meeting >>>> to go over the proposal, it seems there's good feedback from folks from geo >>>> side (and even Parquet community), but not too many eyes/feedback from >>>> other folks/PMC on Iceberg community. This might be due to lack of >>>> familiarity/ time to read through it all. In fact, a lot of the advanced >>>> discussions like this one are for Phase 2 items, and Phase 1 items are >>>> relatively straightforward, so wanted to explain that. As I know its >>>> summer vacation for some folks, we can do this in a week or early July, >>>> hope that sounds good with everyone. >>>> >>>> Thanks, >>>> Szehon >>>> >>>> On Tue, Jun 18, 2024 at 1:54 AM Jean-Baptiste Onofré <j...@nanthrax.net> >>>> wrote: >>>> >>>>> Hi Jia >>>>> >>>>> Thanks for the update. I'm gonna re-read the whole thread and document >>>>> to have a better understanding. >>>>> >>>>> Thanks ! >>>>> Regards >>>>> JB >>>>> >>>>> On Mon, Jun 17, 2024 at 7:44 PM Jia Yu <ji...@apache.org> wrote: >>>>> >>>>>> Hi Snowflake folks, >>>>>> >>>>>> Please let me know if you have other questions regarding the >>>>>> proposal. If any, Szehon and I can set up a zoom call with you guys to >>>>>> clarify some details. We are in the Pacific time zone. If you are in >>>>>> Europe, maybe early morning Pacific Time works best for you? >>>>>> >>>>>> Thanks, >>>>>> Jia >>>>>> >>>>>> On Wed, Jun 5, 2024 at 6:28 PM Gang Wu <ust...@gmail.com> wrote: >>>>>> >>>>>>> > The min/max stats are discussed in the doc (Phase 2), depending on >>>>>>> the non-trivial encoding. >>>>>>> >>>>>>> Just want to add that min/max stats filtering could be supported by >>>>>>> file format natively. Adding geometry type to parquet spec >>>>>>> is under discussion: >>>>>>> https://github.com/apache/parquet-format/pull/240 >>>>>>> >>>>>>> Best, >>>>>>> Gang >>>>>>> >>>>>>> On Thu, Jun 6, 2024 at 5:53 AM Szehon Ho <szehon.apa...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Peter >>>>>>>> >>>>>>>> Yes the document only concerns the predicate pushdown of geometric >>>>>>>> column. Predicate pushdown takes two forms, 1) partition filter and 2) >>>>>>>> min/max stats. The min/max stats are discussed in the doc (Phase 2), >>>>>>>> depending on the non-trivial encoding. >>>>>>>> >>>>>>>> The evaluators are always AND'ed together, so I dont see any issue >>>>>>>> of partitioning with another key not working on a table with a geo >>>>>>>> column. >>>>>>>> >>>>>>>> On another note, Jia and I thought that we may have a discussion >>>>>>>> about Snowflake geo types in a call to drill down on some details? >>>>>>>> What >>>>>>>> time zone are you folks in/ what time works better ? I think Jia and >>>>>>>> I are >>>>>>>> both in Pacific time zone. >>>>>>>> >>>>>>>> Thanks >>>>>>>> Szehon >>>>>>>> >>>>>>>> On Wed, Jun 5, 2024 at 1:02 AM Peter Popov < >>>>>>>> peter.po...@snowflake.com> wrote: >>>>>>>> >>>>>>>>> Hi Szehon, hi Jia, >>>>>>>>> >>>>>>>>> Thank you for your replies. We now better understand the >>>>>>>>> connection between the metadata and partitioning in this proposal. >>>>>>>>> Supporting the Mapping 1 is a great starting point, and we would like >>>>>>>>> to >>>>>>>>> work closer with you on bringing the support for spherical edges and >>>>>>>>> other >>>>>>>>> coordinate systems into Iceberg geometry. >>>>>>>>> >>>>>>>>> We have some follow-up questions regarding the partitioning (let >>>>>>>>> us know if it’s better to comment directly in the document): Does this >>>>>>>>> proposal imply that XZ2 partitioning is always required? In the >>>>>>>>> current proposal, do you see a possibility of predicate pushdown >>>>>>>>> to rely on x/y min/max column metadata instead of a partition key? We >>>>>>>>> see >>>>>>>>> use-cases where a table with a geo column can be partitioned by a >>>>>>>>> different >>>>>>>>> key(e.g. date) or combination of keys. It would be great to support >>>>>>>>> such >>>>>>>>> use cases from the very beginning. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Peter >>>>>>>>> >>>>>>>>> On Thu, May 30, 2024 at 8:07 AM Jia Yu <ji...@apache.org> wrote: >>>>>>>>> >>>>>>>>>> Hi Dmtro, >>>>>>>>>> >>>>>>>>>> Thanks for your email. To add to Szehon's answer, >>>>>>>>>> >>>>>>>>>> 1. How to represent Snowflake Geometry and Geography type in >>>>>>>>>> Iceberg, given the Geo Iceberg Phase 1 design: >>>>>>>>>> >>>>>>>>>> Answer: >>>>>>>>>> Mapping 1 (possible): Snowflake Geometry + SRID: 4326 -> Iceberg >>>>>>>>>> Geometry + CRS84 + edges: Planar >>>>>>>>>> Mapping 2 (impossible): Snowflake Geography -> Iceberg Geometry + >>>>>>>>>> CRS84 + edges: Spherical >>>>>>>>>> Mapping 3 (impossible): Snowflake Geometry + SRID:ABCDE-> Iceberg >>>>>>>>>> Geometry + SRID:ABCDE + edges: Planar >>>>>>>>>> >>>>>>>>>> As Szehon mentioned, only Mapping 1 is possible because we need >>>>>>>>>> to support spatial query push down in Iceberg. This function relies >>>>>>>>>> on the >>>>>>>>>> Iceberg partition transform, which requires a 1:1 mapping between a >>>>>>>>>> value >>>>>>>>>> (point/polygon/linestring) and a partition key. That is: given any >>>>>>>>>> precision level, a polygon must produce a single ID; and the covering >>>>>>>>>> indicated by this single ID must fully cover the extent of the >>>>>>>>>> polygon. >>>>>>>>>> Currently, only xz2 can satisfy this requirement. If the theory from >>>>>>>>>> Michael Entin can be proven to be correct, then we can support >>>>>>>>>> Mapping 2 in >>>>>>>>>> Phase 2 of Geo Iceberg. >>>>>>>>>> >>>>>>>>>> Regarding Mapping 3, this requires Iceberg to be able to >>>>>>>>>> understand SRID / PROJJSON such that we will know min max X Y of the >>>>>>>>>> CRS >>>>>>>>>> (@Szehon, maybe Iceberg can ask the engine to provide this >>>>>>>>>> information?). >>>>>>>>>> See my answer 2. >>>>>>>>>> >>>>>>>>>> 2. Why choose projjson instead of SRID? >>>>>>>>>> >>>>>>>>>> The projjson idea was borrowed from GeoParquet because we'd like >>>>>>>>>> to enable possible conversion between Geo Iceberg and GeoParquet. >>>>>>>>>> However, >>>>>>>>>> I do understand that this is not a good idea for Iceberg since not >>>>>>>>>> many >>>>>>>>>> libs can parse projjson. >>>>>>>>>> >>>>>>>>>> @Szehon Is there a way that we can support both SRID and PROJJSON >>>>>>>>>> in Geo Iceberg? >>>>>>>>>> >>>>>>>>>> It is also worth noting that, although there are many libs that >>>>>>>>>> can parse SRID and perform look-up in the EPSG database, the license >>>>>>>>>> of the >>>>>>>>>> EPSG database is NOT compatible with the Apache Software Foundation. >>>>>>>>>> That >>>>>>>>>> means: Iceberg still cannot parse / understand SRID. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Jia >>>>>>>>>> >>>>>>>>>> On Wed, May 29, 2024 at 11:08 AM Szehon Ho < >>>>>>>>>> szehon.apa...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Dmytro >>>>>>>>>>> >>>>>>>>>>> Thank you for looking through the proposal and excited to hear >>>>>>>>>>> from you guys! I am not a 'geo expert' and I will definitely need >>>>>>>>>>> to pull >>>>>>>>>>> in Jia Yu for some of these points. >>>>>>>>>>> >>>>>>>>>>> Although most calculations are done on the query engine, Iceberg >>>>>>>>>>> reference implementations (ie, Java, Python) does have to support a >>>>>>>>>>> few >>>>>>>>>>> calculations to handle filter push down: >>>>>>>>>>> >>>>>>>>>>> 1. push down of the proposed Geospatial transforms >>>>>>>>>>> ST_COVERS, ST_COVERED_BY, and ST_INTERSECTS >>>>>>>>>>> 2. evaluation of proposed Geospatial partition transform >>>>>>>>>>> XZ2. As you may have seen, this was chosen as its the only >>>>>>>>>>> standard one >>>>>>>>>>> today that solves the 'boundary object' problem, still >>>>>>>>>>> preserving 1-to-1 >>>>>>>>>>> mapping of row => partition value. >>>>>>>>>>> >>>>>>>>>>> This is the primary rationale for choosing the values, as these >>>>>>>>>>> were implemented in the GeoLake and Havasu projects (Iceberg forks >>>>>>>>>>> that >>>>>>>>>>> sparked the proposal) based on Geometry type (edge=planar, >>>>>>>>>>> crs=OGC:CRS84/ >>>>>>>>>>> SRID=4326). >>>>>>>>>>> >>>>>>>>>>> 2. As you mentioned [2] in the proposal there are difficulties >>>>>>>>>>>> with supporting the full PROJSSON specification of the SRS. From >>>>>>>>>>>> our >>>>>>>>>>>> experience most of the use-cases do not require the full >>>>>>>>>>>> definition of the >>>>>>>>>>>> SRS, in fact that definition is only needed when converting between >>>>>>>>>>>> coordinate systems. On the other hand, it’s often needed to check >>>>>>>>>>>> whether >>>>>>>>>>>> two geometry columns have the same coordinate system, for example >>>>>>>>>>>> when >>>>>>>>>>>> joining two columns from different data providers. >>>>>>>>>>>> >>>>>>>>>>>> To address this we would like to propose including the option >>>>>>>>>>>> to specify the SRS with only a SRID in phase 1. The query engine >>>>>>>>>>>> may choose >>>>>>>>>>>> to treat it as opaque identified or make a look-up in the EPSG >>>>>>>>>>>> database of >>>>>>>>>>>> supported. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> The way to specify CRS definition is actually taken from >>>>>>>>>>> GeoParquet [1], I think we are not bound to follow it if there are >>>>>>>>>>> better >>>>>>>>>>> options. I feel we might need to at least list out supported >>>>>>>>>>> configurations in the spec, though. There is some conversation on >>>>>>>>>>> the doc >>>>>>>>>>> here about this [2]. Basically: >>>>>>>>>>> >>>>>>>>>>> 1. XZ2 assumes planar edges. This is a feature of the >>>>>>>>>>> algorithm, based on the original paper. A possible solution to >>>>>>>>>>> spherical >>>>>>>>>>> edge is proposed by Michael Entin here: [3], please feel free to >>>>>>>>>>> evaluate. >>>>>>>>>>> 2. XZ2 needs to know the coordinate range. According to >>>>>>>>>>> Jia's comments, this needs parsing of the CRS. Can it be done >>>>>>>>>>> with SRID >>>>>>>>>>> alone? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> 1. In the first version of the specification Phase1 it is >>>>>>>>>>>> mentioned as the version focused on the planar geometry model with >>>>>>>>>>>> a CRS >>>>>>>>>>>> system fixed on 4326. In this model, Snowflake would not be able >>>>>>>>>>>> to map our >>>>>>>>>>>> Geography type since it is based on the spherical Geography model. >>>>>>>>>>>> Given >>>>>>>>>>>> that Snowflake supports both edge types, we would like to better >>>>>>>>>>>> understand >>>>>>>>>>>> how to map them to the proposed Geometry type and its metadata. >>>>>>>>>>>> >>>>>>>>>>>> - >>>>>>>>>>>> >>>>>>>>>>>> How is the edge type supposed to be interpreted by the >>>>>>>>>>>> query engine? Is it necessary for the system to adhere to the >>>>>>>>>>>> edge model >>>>>>>>>>>> for geospatial functions, or can it use the model that it >>>>>>>>>>>> supports or let >>>>>>>>>>>> the customer choose it? Will it affect the bounding box or >>>>>>>>>>>> other row group >>>>>>>>>>>> metadata >>>>>>>>>>>> - >>>>>>>>>>>> >>>>>>>>>>>> Is there any reason why the flexible model has to be >>>>>>>>>>>> postponed to further iterations? Would it be more extensible to >>>>>>>>>>>> support >>>>>>>>>>>> mutable edge type from the Phase 1, but allow systems to ignore >>>>>>>>>>>> it if they >>>>>>>>>>>> do not support the spherical computation model >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> It may be answered by the previous paragraph in regards to XZ2. >>>>>>>>>>> >>>>>>>>>>> 1. If we get XZ2 to work with a more variable CRS without >>>>>>>>>>> requiring full PROJJSON specification, it seems it is a path to >>>>>>>>>>> support >>>>>>>>>>> Snowflake Geometry type? >>>>>>>>>>> 2. If we get another one-to-one partition function on >>>>>>>>>>> spherical edges, like the one proposed by Michael, it seems a >>>>>>>>>>> path to >>>>>>>>>>> support Snowflake Geography type? >>>>>>>>>>> >>>>>>>>>>> Does that sound correct? As for why certain things are marked >>>>>>>>>>> as Phase 1, they are just chosen so we can all agree on an initial >>>>>>>>>>> design >>>>>>>>>>> and iterate faster and not set in stone, maybe the path 1 is >>>>>>>>>>> possible to do >>>>>>>>>>> quickly, for example. >>>>>>>>>>> >>>>>>>>>>> Also , I am not sure about handling evaluation of ST_COVERS, >>>>>>>>>>> ST_COVERED_BY, and ST_INTERSECTS (how easy to handle different CRS + >>>>>>>>>>> spherical edges). I will leave it to Jia. >>>>>>>>>>> >>>>>>>>>>> Thanks! >>>>>>>>>>> Szehon >>>>>>>>>>> >>>>>>>>>>> [1]: >>>>>>>>>>> https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md#column-metadata >>>>>>>>>>> [2]: >>>>>>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit?disco=AAABL-z6xXk >>>>>>>>>>> <https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit?disco=AAABL-z6xXk> >>>>>>>>>>> [3]: >>>>>>>>>>> https://docs.google.com/document/d/1tG13UpdNH3i0bVkjFLsE2kXEXCuw1XRpAC2L2qCUox0/edit >>>>>>>>>>> <https://docs.google.com/document/d/1tG13UpdNH3i0bVkjFLsE2kXEXCuw1XRpAC2L2qCUox0/edit> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, May 29, 2024 at 8:30 AM Dmytro Koval >>>>>>>>>>> <dmytro.ko...@snowflake.com.invalid> wrote: >>>>>>>>>>> >>>>>>>>>>>> Dear Szehon and Iceberg Community, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> This is Dmytro, Peter, Aihua, and Tyler from Snowflake. As part >>>>>>>>>>>> of our desire to be more active in the Iceberg community, we’ve >>>>>>>>>>>> been >>>>>>>>>>>> looking over this geospatial proposal. We’re excited geospatial is >>>>>>>>>>>> getting >>>>>>>>>>>> traction, as we see a lot of geo usage within Snowflake, and >>>>>>>>>>>> expect that >>>>>>>>>>>> usage to carry over to our Iceberg offerings soon. After reviewing >>>>>>>>>>>> the >>>>>>>>>>>> proposal, we have some questions we’d like to pose given our >>>>>>>>>>>> experience >>>>>>>>>>>> with geospatial support in Snowflake. >>>>>>>>>>>> >>>>>>>>>>>> We would like to clarify two aspects of the proposal: handling >>>>>>>>>>>> of the spherical model and definition of the spatial reference >>>>>>>>>>>> system. Both >>>>>>>>>>>> of which have a big impact on the interoperability with Snowflake >>>>>>>>>>>> and other >>>>>>>>>>>> query engines and Geo processing systems. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Let us first share some context about geospatial types at >>>>>>>>>>>> Snowflake; geo experts will certainly be familiar with this context >>>>>>>>>>>> already, but for the sake of others we want to err on the side of >>>>>>>>>>>> being >>>>>>>>>>>> explicit and clear. Snowflake supports two Geospatial types [1]: >>>>>>>>>>>> - Geography – uses a spherical approximation of the earth for >>>>>>>>>>>> all the computations. It does not perfectly represent the earth, >>>>>>>>>>>> but allows >>>>>>>>>>>> getting accurate results on WGS84 coordinates, used by GPS without >>>>>>>>>>>> any need >>>>>>>>>>>> to perform coordinate system reprojections. It is also quite fast >>>>>>>>>>>> for >>>>>>>>>>>> end-to-end computations. In general, it has less distortions >>>>>>>>>>>> compared to >>>>>>>>>>>> the 2d planar model . >>>>>>>>>>>> - Geometry – uses planar Euclidean geometry model. Geometric >>>>>>>>>>>> computations are simpler, but require transforming the data between >>>>>>>>>>>> coordinate systems to minimize the distortion. The Geometry data >>>>>>>>>>>> type >>>>>>>>>>>> allows setting a spatial reference system for each row using the >>>>>>>>>>>> SRID. The >>>>>>>>>>>> binary geospatial functions are only allowed on the geometries >>>>>>>>>>>> with the >>>>>>>>>>>> same SRID. The only function that interprets SRID is ST_TRANFORM >>>>>>>>>>>> that >>>>>>>>>>>> allows conversion between different SRSs. >>>>>>>>>>>> >>>>>>>>>>>> Geography >>>>>>>>>>>> >>>>>>>>>>>> Geometry >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Given the choice of two types and a set of operations on top of >>>>>>>>>>>> them, the majority of Snowflake users select the Geography type to >>>>>>>>>>>> represent their geospatial data. >>>>>>>>>>>> >>>>>>>>>>>> From our perspective, Iceberg users would benefit most from >>>>>>>>>>>> being given the flexibility to store and process data using the >>>>>>>>>>>> model that >>>>>>>>>>>> better fits their needs and specific use cases. >>>>>>>>>>>> >>>>>>>>>>>> Therefore, we would like to ask some design clarifying >>>>>>>>>>>> questions, important for interoperability: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 1. In the first version of the specification Phase1 it is >>>>>>>>>>>> mentioned as the version focused on the planar geometry model with >>>>>>>>>>>> a CRS >>>>>>>>>>>> system fixed on 4326. In this model, Snowflake would not be able >>>>>>>>>>>> to map our >>>>>>>>>>>> Geography type since it is based on the spherical Geography model. >>>>>>>>>>>> Given >>>>>>>>>>>> that Snowflake supports both edge types, we would like to better >>>>>>>>>>>> understand >>>>>>>>>>>> how to map them to the proposed Geometry type and its metadata. >>>>>>>>>>>> >>>>>>>>>>>> - >>>>>>>>>>>> >>>>>>>>>>>> How is the edge type supposed to be interpreted by the >>>>>>>>>>>> query engine? Is it necessary for the system to adhere to the >>>>>>>>>>>> edge model >>>>>>>>>>>> for geospatial functions, or can it use the model that it >>>>>>>>>>>> supports or let >>>>>>>>>>>> the customer choose it? Will it affect the bounding box or >>>>>>>>>>>> other row group >>>>>>>>>>>> metadata >>>>>>>>>>>> - >>>>>>>>>>>> >>>>>>>>>>>> Is there any reason why the flexible model has to be >>>>>>>>>>>> postponed to further iterations? Would it be more extensible to >>>>>>>>>>>> support >>>>>>>>>>>> mutable edge type from the Phase 1, but allow systems to ignore >>>>>>>>>>>> it if they >>>>>>>>>>>> do not support the spherical computation model >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2. As you mentioned [2] in the proposal there are difficulties >>>>>>>>>>>> with supporting the full PROJSSON specification of the SRS. From >>>>>>>>>>>> our >>>>>>>>>>>> experience most of the use-cases do not require the full >>>>>>>>>>>> definition of the >>>>>>>>>>>> SRS, in fact that definition is only needed when converting between >>>>>>>>>>>> coordinate systems. On the other hand, it’s often needed to check >>>>>>>>>>>> whether >>>>>>>>>>>> two geometry columns have the same coordinate system, for example >>>>>>>>>>>> when >>>>>>>>>>>> joining two columns from different data providers. >>>>>>>>>>>> >>>>>>>>>>>> To address this we would like to propose including the option >>>>>>>>>>>> to specify the SRS with only a SRID in phase 1. The query engine >>>>>>>>>>>> may choose >>>>>>>>>>>> to treat it as opaque identified or make a look-up in the EPSG >>>>>>>>>>>> database of >>>>>>>>>>>> supported. >>>>>>>>>>>> >>>>>>>>>>>> Thank you again for driving this effort forward. We look >>>>>>>>>>>> forward to hearing your thoughts. >>>>>>>>>>>> >>>>>>>>>>>> [1] >>>>>>>>>>>> https://docs.snowflake.com/en/sql-reference/data-types-geospatial#understanding-the-differences-between-geography-and-geometry >>>>>>>>>>>> >>>>>>>>>>>> [2] >>>>>>>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit#heading=h.oruaqt3nxcaf >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 2024/05/02 00:41:52 Szehon Ho wrote: >>>>>>>>>>>> > Hi everyone, >>>>>>>>>>>> > >>>>>>>>>>>> > We have created a formal proposal for adding Geospatial >>>>>>>>>>>> support to Iceberg. >>>>>>>>>>>> > >>>>>>>>>>>> > Please read the following for details. >>>>>>>>>>>> > >>>>>>>>>>>> > - Github Proposal : >>>>>>>>>>>> https://github.com/apache/iceberg/issues/10260 >>>>>>>>>>>> > - Proposal Doc: >>>>>>>>>>>> > >>>>>>>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > Note that this proposal is built on existing extensive >>>>>>>>>>>> research and POC >>>>>>>>>>>> > implementations (Geolake, Havasu). Special thanks to Jia Yu >>>>>>>>>>>> and Kristin >>>>>>>>>>>> > Cowalcijk from Wherobots/Geolake for extensive consultation >>>>>>>>>>>> and help in >>>>>>>>>>>> > writing this proposal, as well as support from Yuanyuan Zhang >>>>>>>>>>>> from Geolake. >>>>>>>>>>>> > >>>>>>>>>>>> > We would love to get more feedback for this proposal from the >>>>>>>>>>>> wider >>>>>>>>>>>> > community and eventually discuss this in a community sync. >>>>>>>>>>>> > >>>>>>>>>>>> > Thanks >>>>>>>>>>>> > Szehon >>>>>>>>>>>> > >>>>>>>>>>>> >>>>>>>>>>>