Thanks Szehon! My comments were addressed. I'm ready to vote. Yufei
On Mon, Sep 30, 2024 at 11:47 AM Russell Spitzer <russell.spit...@gmail.com> wrote: > All my concerns are addressed, I'm ready to vote. > > On Mon, Sep 30, 2024 at 1:21 PM Szehon Ho <szehon.apa...@gmail.com> wrote: > >> Hi all, >> >> There have been several rounds of discussion on the PR: >> https://github.com/apache/iceberg/pull/10981 and I think most of the >> main points have been addressed. >> >> If anyone is interested, please take a look. If there are no other major >> points, we plan to start a VOTE thread soon. >> >> I know Jia and team are also volunteering to work on the prototype >> immediately afterwards. >> >> Thank you, >> Szehon >> >> On Tue, Aug 20, 2024 at 1:57 PM Szehon Ho <szehon.apa...@gmail.com> >> wrote: >> >>> Hi all >>> >>> Please take a look at the proposed spec change to support Geo type for >>> V3 in : https://github.com/apache/iceberg/pull/10981, and comment or >>> otherwise let me know your thoughts. >>> >>> Just as an FYI it incorporated the feedback from our last meeting (with >>> Snowflake and Wherobots engineers). >>> >>> Thanks, >>> Szehon >>> >>> On Wed, Jun 26, 2024 at 7:29 PM Szehon Ho <szehon.apa...@gmail.com> >>> wrote: >>> >>>> Hi >>>> >>>> It was great to meet in person with Snowflake engineers and we had a >>>> good discussion on the paths forward. >>>> >>>> Meeting notes for Snowflake- Iceberg sync. >>>> >>>> - Iceberg proposed Geometry type defaults to (edges=planar , >>>> crs=CRS84). >>>> - Snowflake has two types Geography (spherical) and Geometry >>>> (planar, with customizable CRS). The data layout/encoding is the same >>>> for >>>> both types. Let's see how we can support each in Iceberg type, >>>> especially >>>> wrt Iceberg partition/file pruning >>>> - Geography type support >>>> - Main concern is the need for a suitable partition transform for >>>> partition-level filter, the candidate is Micahel Entin's proposal >>>> >>>> <https://docs.google.com/document/d/1tG13UpdNH3i0bVkjFLsE2kXEXCuw1XRpAC2L2qCUox0/edit> >>>> . >>>> - Secondary concern is file and RG-level filtering. Gang's Parquet >>>> proposal >>>> <https://github.com/apache/parquet-format/pull/240/files> allow >>>> storage of S2 / H3 ID's in Parquet stats, and so we can also >>>> leverage that >>>> in Iceberg pruning code (Google and Uber libraries are compatible) >>>> - Geometry type support >>>> - Main concern is partition transform needs to understand CRS, >>>> but this can be solved by having XZ2 transform created with >>>> customizable >>>> min/max lat/long range (its all it needs) >>>> - Should (CRS, edges) be stored properties on Geography type in >>>> Phase 1? >>>> - Should be fine to store, with only allowing defaults in Phase >>>> 1. >>>> - Concern 1: If edges is stored, there will be ask to store >>>> other properties like (orientation, epoch). Solution is to punt >>>> these >>>> follow-on properties for later. >>>> - Concern 2: if crs is stored, what format? PROJJSON vs SRID. >>>> Solution is to leave it as a string >>>> - Concern 3: if crs is stored as a string, Iceberg cannot read >>>> it. This should be ok, as we only need this for XZ2 transform, >>>> where the >>>> user already passes in the info from CRS (up to user to make sure >>>> these >>>> align). >>>> >>>> Thanks >>>> Szehon >>>> >>>> On Tue, Jun 18, 2024 at 12:23 PM Szehon Ho <szehon.apa...@gmail.com> >>>> wrote: >>>> >>>>> Jia and I will sync with the Snowflake folks to see if we can have a >>>>> solution, or roadmap to solution, in the proposal. >>>>> >>>>> Thanks JB for the interest! By the way, I want to schedule a meeting >>>>> to go over the proposal, it seems there's good feedback from folks from >>>>> geo >>>>> side (and even Parquet community), but not too many eyes/feedback from >>>>> other folks/PMC on Iceberg community. This might be due to lack of >>>>> familiarity/ time to read through it all. In fact, a lot of the advanced >>>>> discussions like this one are for Phase 2 items, and Phase 1 items are >>>>> relatively straightforward, so wanted to explain that. As I know its >>>>> summer vacation for some folks, we can do this in a week or early July, >>>>> hope that sounds good with everyone. >>>>> >>>>> Thanks, >>>>> Szehon >>>>> >>>>> On Tue, Jun 18, 2024 at 1:54 AM Jean-Baptiste Onofré <j...@nanthrax.net> >>>>> wrote: >>>>> >>>>>> Hi Jia >>>>>> >>>>>> Thanks for the update. I'm gonna re-read the whole thread and >>>>>> document to have a better understanding. >>>>>> >>>>>> Thanks ! >>>>>> Regards >>>>>> JB >>>>>> >>>>>> On Mon, Jun 17, 2024 at 7:44 PM Jia Yu <ji...@apache.org> wrote: >>>>>> >>>>>>> Hi Snowflake folks, >>>>>>> >>>>>>> Please let me know if you have other questions regarding the >>>>>>> proposal. If any, Szehon and I can set up a zoom call with you guys to >>>>>>> clarify some details. We are in the Pacific time zone. If you are in >>>>>>> Europe, maybe early morning Pacific Time works best for you? >>>>>>> >>>>>>> Thanks, >>>>>>> Jia >>>>>>> >>>>>>> On Wed, Jun 5, 2024 at 6:28 PM Gang Wu <ust...@gmail.com> wrote: >>>>>>> >>>>>>>> > The min/max stats are discussed in the doc (Phase 2), depending >>>>>>>> on the non-trivial encoding. >>>>>>>> >>>>>>>> Just want to add that min/max stats filtering could be supported by >>>>>>>> file format natively. Adding geometry type to parquet spec >>>>>>>> is under discussion: >>>>>>>> https://github.com/apache/parquet-format/pull/240 >>>>>>>> >>>>>>>> Best, >>>>>>>> Gang >>>>>>>> >>>>>>>> On Thu, Jun 6, 2024 at 5:53 AM Szehon Ho <szehon.apa...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Peter >>>>>>>>> >>>>>>>>> Yes the document only concerns the predicate pushdown of geometric >>>>>>>>> column. Predicate pushdown takes two forms, 1) partition filter and >>>>>>>>> 2) >>>>>>>>> min/max stats. The min/max stats are discussed in the doc (Phase 2), >>>>>>>>> depending on the non-trivial encoding. >>>>>>>>> >>>>>>>>> The evaluators are always AND'ed together, so I dont see any issue >>>>>>>>> of partitioning with another key not working on a table with a geo >>>>>>>>> column. >>>>>>>>> >>>>>>>>> On another note, Jia and I thought that we may have a discussion >>>>>>>>> about Snowflake geo types in a call to drill down on some details? >>>>>>>>> What >>>>>>>>> time zone are you folks in/ what time works better ? I think Jia and >>>>>>>>> I are >>>>>>>>> both in Pacific time zone. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Szehon >>>>>>>>> >>>>>>>>> On Wed, Jun 5, 2024 at 1:02 AM Peter Popov < >>>>>>>>> peter.po...@snowflake.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Szehon, hi Jia, >>>>>>>>>> >>>>>>>>>> Thank you for your replies. We now better understand the >>>>>>>>>> connection between the metadata and partitioning in this proposal. >>>>>>>>>> Supporting the Mapping 1 is a great starting point, and we would >>>>>>>>>> like to >>>>>>>>>> work closer with you on bringing the support for spherical edges and >>>>>>>>>> other >>>>>>>>>> coordinate systems into Iceberg geometry. >>>>>>>>>> >>>>>>>>>> We have some follow-up questions regarding the partitioning (let >>>>>>>>>> us know if it’s better to comment directly in the document): Does >>>>>>>>>> this >>>>>>>>>> proposal imply that XZ2 partitioning is always required? In the >>>>>>>>>> current proposal, do you see a possibility of predicate pushdown >>>>>>>>>> to rely on x/y min/max column metadata instead of a partition key? >>>>>>>>>> We see >>>>>>>>>> use-cases where a table with a geo column can be partitioned by a >>>>>>>>>> different >>>>>>>>>> key(e.g. date) or combination of keys. It would be great to support >>>>>>>>>> such >>>>>>>>>> use cases from the very beginning. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Peter >>>>>>>>>> >>>>>>>>>> On Thu, May 30, 2024 at 8:07 AM Jia Yu <ji...@apache.org> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Dmtro, >>>>>>>>>>> >>>>>>>>>>> Thanks for your email. To add to Szehon's answer, >>>>>>>>>>> >>>>>>>>>>> 1. How to represent Snowflake Geometry and Geography type in >>>>>>>>>>> Iceberg, given the Geo Iceberg Phase 1 design: >>>>>>>>>>> >>>>>>>>>>> Answer: >>>>>>>>>>> Mapping 1 (possible): Snowflake Geometry + SRID: 4326 -> Iceberg >>>>>>>>>>> Geometry + CRS84 + edges: Planar >>>>>>>>>>> Mapping 2 (impossible): Snowflake Geography -> Iceberg >>>>>>>>>>> Geometry + CRS84 + edges: Spherical >>>>>>>>>>> Mapping 3 (impossible): Snowflake Geometry + SRID:ABCDE-> >>>>>>>>>>> Iceberg Geometry + SRID:ABCDE + edges: Planar >>>>>>>>>>> >>>>>>>>>>> As Szehon mentioned, only Mapping 1 is possible because we need >>>>>>>>>>> to support spatial query push down in Iceberg. This function relies >>>>>>>>>>> on the >>>>>>>>>>> Iceberg partition transform, which requires a 1:1 mapping between a >>>>>>>>>>> value >>>>>>>>>>> (point/polygon/linestring) and a partition key. That is: given any >>>>>>>>>>> precision level, a polygon must produce a single ID; and the >>>>>>>>>>> covering >>>>>>>>>>> indicated by this single ID must fully cover the extent of the >>>>>>>>>>> polygon. >>>>>>>>>>> Currently, only xz2 can satisfy this requirement. If the theory from >>>>>>>>>>> Michael Entin can be proven to be correct, then we can support >>>>>>>>>>> Mapping 2 in >>>>>>>>>>> Phase 2 of Geo Iceberg. >>>>>>>>>>> >>>>>>>>>>> Regarding Mapping 3, this requires Iceberg to be able to >>>>>>>>>>> understand SRID / PROJJSON such that we will know min max X Y of >>>>>>>>>>> the CRS >>>>>>>>>>> (@Szehon, maybe Iceberg can ask the engine to provide this >>>>>>>>>>> information?). >>>>>>>>>>> See my answer 2. >>>>>>>>>>> >>>>>>>>>>> 2. Why choose projjson instead of SRID? >>>>>>>>>>> >>>>>>>>>>> The projjson idea was borrowed from GeoParquet because we'd like >>>>>>>>>>> to enable possible conversion between Geo Iceberg and GeoParquet. >>>>>>>>>>> However, >>>>>>>>>>> I do understand that this is not a good idea for Iceberg since not >>>>>>>>>>> many >>>>>>>>>>> libs can parse projjson. >>>>>>>>>>> >>>>>>>>>>> @Szehon Is there a way that we can support both SRID and >>>>>>>>>>> PROJJSON in Geo Iceberg? >>>>>>>>>>> >>>>>>>>>>> It is also worth noting that, although there are many libs that >>>>>>>>>>> can parse SRID and perform look-up in the EPSG database, the >>>>>>>>>>> license of the >>>>>>>>>>> EPSG database is NOT compatible with the Apache Software >>>>>>>>>>> Foundation. That >>>>>>>>>>> means: Iceberg still cannot parse / understand SRID. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Jia >>>>>>>>>>> >>>>>>>>>>> On Wed, May 29, 2024 at 11:08 AM Szehon Ho < >>>>>>>>>>> szehon.apa...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Dmytro >>>>>>>>>>>> >>>>>>>>>>>> Thank you for looking through the proposal and excited to hear >>>>>>>>>>>> from you guys! I am not a 'geo expert' and I will definitely need >>>>>>>>>>>> to pull >>>>>>>>>>>> in Jia Yu for some of these points. >>>>>>>>>>>> >>>>>>>>>>>> Although most calculations are done on the query engine, >>>>>>>>>>>> Iceberg reference implementations (ie, Java, Python) does have to >>>>>>>>>>>> support a >>>>>>>>>>>> few calculations to handle filter push down: >>>>>>>>>>>> >>>>>>>>>>>> 1. push down of the proposed Geospatial transforms >>>>>>>>>>>> ST_COVERS, ST_COVERED_BY, and ST_INTERSECTS >>>>>>>>>>>> 2. evaluation of proposed Geospatial partition transform >>>>>>>>>>>> XZ2. As you may have seen, this was chosen as its the only >>>>>>>>>>>> standard one >>>>>>>>>>>> today that solves the 'boundary object' problem, still >>>>>>>>>>>> preserving 1-to-1 >>>>>>>>>>>> mapping of row => partition value. >>>>>>>>>>>> >>>>>>>>>>>> This is the primary rationale for choosing the values, as these >>>>>>>>>>>> were implemented in the GeoLake and Havasu projects (Iceberg forks >>>>>>>>>>>> that >>>>>>>>>>>> sparked the proposal) based on Geometry type (edge=planar, >>>>>>>>>>>> crs=OGC:CRS84/ >>>>>>>>>>>> SRID=4326). >>>>>>>>>>>> >>>>>>>>>>>> 2. As you mentioned [2] in the proposal there are difficulties >>>>>>>>>>>>> with supporting the full PROJSSON specification of the SRS. From >>>>>>>>>>>>> our >>>>>>>>>>>>> experience most of the use-cases do not require the full >>>>>>>>>>>>> definition of the >>>>>>>>>>>>> SRS, in fact that definition is only needed when converting >>>>>>>>>>>>> between >>>>>>>>>>>>> coordinate systems. On the other hand, it’s often needed to check >>>>>>>>>>>>> whether >>>>>>>>>>>>> two geometry columns have the same coordinate system, for example >>>>>>>>>>>>> when >>>>>>>>>>>>> joining two columns from different data providers. >>>>>>>>>>>>> >>>>>>>>>>>>> To address this we would like to propose including the option >>>>>>>>>>>>> to specify the SRS with only a SRID in phase 1. The query engine >>>>>>>>>>>>> may choose >>>>>>>>>>>>> to treat it as opaque identified or make a look-up in the EPSG >>>>>>>>>>>>> database of >>>>>>>>>>>>> supported. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> The way to specify CRS definition is actually taken from >>>>>>>>>>>> GeoParquet [1], I think we are not bound to follow it if there are >>>>>>>>>>>> better >>>>>>>>>>>> options. I feel we might need to at least list out supported >>>>>>>>>>>> configurations in the spec, though. There is some conversation on >>>>>>>>>>>> the doc >>>>>>>>>>>> here about this [2]. Basically: >>>>>>>>>>>> >>>>>>>>>>>> 1. XZ2 assumes planar edges. This is a feature of the >>>>>>>>>>>> algorithm, based on the original paper. A possible solution to >>>>>>>>>>>> spherical >>>>>>>>>>>> edge is proposed by Michael Entin here: [3], please feel free >>>>>>>>>>>> to evaluate. >>>>>>>>>>>> 2. XZ2 needs to know the coordinate range. According to >>>>>>>>>>>> Jia's comments, this needs parsing of the CRS. Can it be done >>>>>>>>>>>> with SRID >>>>>>>>>>>> alone? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> 1. In the first version of the specification Phase1 it is >>>>>>>>>>>>> mentioned as the version focused on the planar geometry model >>>>>>>>>>>>> with a CRS >>>>>>>>>>>>> system fixed on 4326. In this model, Snowflake would not be able >>>>>>>>>>>>> to map our >>>>>>>>>>>>> Geography type since it is based on the spherical Geography >>>>>>>>>>>>> model. Given >>>>>>>>>>>>> that Snowflake supports both edge types, we would like to better >>>>>>>>>>>>> understand >>>>>>>>>>>>> how to map them to the proposed Geometry type and its metadata. >>>>>>>>>>>>> >>>>>>>>>>>>> - >>>>>>>>>>>>> >>>>>>>>>>>>> How is the edge type supposed to be interpreted by the >>>>>>>>>>>>> query engine? Is it necessary for the system to adhere to the >>>>>>>>>>>>> edge model >>>>>>>>>>>>> for geospatial functions, or can it use the model that it >>>>>>>>>>>>> supports or let >>>>>>>>>>>>> the customer choose it? Will it affect the bounding box or >>>>>>>>>>>>> other row group >>>>>>>>>>>>> metadata >>>>>>>>>>>>> - >>>>>>>>>>>>> >>>>>>>>>>>>> Is there any reason why the flexible model has to be >>>>>>>>>>>>> postponed to further iterations? Would it be more extensible >>>>>>>>>>>>> to support >>>>>>>>>>>>> mutable edge type from the Phase 1, but allow systems to >>>>>>>>>>>>> ignore it if they >>>>>>>>>>>>> do not support the spherical computation model >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> It may be answered by the previous paragraph in regards to XZ2. >>>>>>>>>>>> >>>>>>>>>>>> 1. If we get XZ2 to work with a more variable CRS without >>>>>>>>>>>> requiring full PROJJSON specification, it seems it is a path to >>>>>>>>>>>> support >>>>>>>>>>>> Snowflake Geometry type? >>>>>>>>>>>> 2. If we get another one-to-one partition function on >>>>>>>>>>>> spherical edges, like the one proposed by Michael, it seems a >>>>>>>>>>>> path to >>>>>>>>>>>> support Snowflake Geography type? >>>>>>>>>>>> >>>>>>>>>>>> Does that sound correct? As for why certain things are marked >>>>>>>>>>>> as Phase 1, they are just chosen so we can all agree on an initial >>>>>>>>>>>> design >>>>>>>>>>>> and iterate faster and not set in stone, maybe the path 1 is >>>>>>>>>>>> possible to do >>>>>>>>>>>> quickly, for example. >>>>>>>>>>>> >>>>>>>>>>>> Also , I am not sure about handling evaluation of ST_COVERS, >>>>>>>>>>>> ST_COVERED_BY, and ST_INTERSECTS (how easy to handle different CRS >>>>>>>>>>>> + >>>>>>>>>>>> spherical edges). I will leave it to Jia. >>>>>>>>>>>> >>>>>>>>>>>> Thanks! >>>>>>>>>>>> Szehon >>>>>>>>>>>> >>>>>>>>>>>> [1]: >>>>>>>>>>>> https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md#column-metadata >>>>>>>>>>>> [2]: >>>>>>>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit?disco=AAABL-z6xXk >>>>>>>>>>>> <https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit?disco=AAABL-z6xXk> >>>>>>>>>>>> [3]: >>>>>>>>>>>> https://docs.google.com/document/d/1tG13UpdNH3i0bVkjFLsE2kXEXCuw1XRpAC2L2qCUox0/edit >>>>>>>>>>>> <https://docs.google.com/document/d/1tG13UpdNH3i0bVkjFLsE2kXEXCuw1XRpAC2L2qCUox0/edit> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, May 29, 2024 at 8:30 AM Dmytro Koval >>>>>>>>>>>> <dmytro.ko...@snowflake.com.invalid> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Dear Szehon and Iceberg Community, >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> This is Dmytro, Peter, Aihua, and Tyler from Snowflake. As >>>>>>>>>>>>> part of our desire to be more active in the Iceberg community, >>>>>>>>>>>>> we’ve been >>>>>>>>>>>>> looking over this geospatial proposal. We’re excited geospatial >>>>>>>>>>>>> is getting >>>>>>>>>>>>> traction, as we see a lot of geo usage within Snowflake, and >>>>>>>>>>>>> expect that >>>>>>>>>>>>> usage to carry over to our Iceberg offerings soon. After >>>>>>>>>>>>> reviewing the >>>>>>>>>>>>> proposal, we have some questions we’d like to pose given our >>>>>>>>>>>>> experience >>>>>>>>>>>>> with geospatial support in Snowflake. >>>>>>>>>>>>> >>>>>>>>>>>>> We would like to clarify two aspects of the proposal: handling >>>>>>>>>>>>> of the spherical model and definition of the spatial reference >>>>>>>>>>>>> system. Both >>>>>>>>>>>>> of which have a big impact on the interoperability with Snowflake >>>>>>>>>>>>> and other >>>>>>>>>>>>> query engines and Geo processing systems. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Let us first share some context about geospatial types at >>>>>>>>>>>>> Snowflake; geo experts will certainly be familiar with this >>>>>>>>>>>>> context >>>>>>>>>>>>> already, but for the sake of others we want to err on the side of >>>>>>>>>>>>> being >>>>>>>>>>>>> explicit and clear. Snowflake supports two Geospatial types [1]: >>>>>>>>>>>>> - Geography – uses a spherical approximation of the earth for >>>>>>>>>>>>> all the computations. It does not perfectly represent the earth, >>>>>>>>>>>>> but allows >>>>>>>>>>>>> getting accurate results on WGS84 coordinates, used by GPS >>>>>>>>>>>>> without any need >>>>>>>>>>>>> to perform coordinate system reprojections. It is also quite fast >>>>>>>>>>>>> for >>>>>>>>>>>>> end-to-end computations. In general, it has less distortions >>>>>>>>>>>>> compared to >>>>>>>>>>>>> the 2d planar model . >>>>>>>>>>>>> - Geometry – uses planar Euclidean geometry model. Geometric >>>>>>>>>>>>> computations are simpler, but require transforming the data >>>>>>>>>>>>> between >>>>>>>>>>>>> coordinate systems to minimize the distortion. The Geometry data >>>>>>>>>>>>> type >>>>>>>>>>>>> allows setting a spatial reference system for each row using the >>>>>>>>>>>>> SRID. The >>>>>>>>>>>>> binary geospatial functions are only allowed on the geometries >>>>>>>>>>>>> with the >>>>>>>>>>>>> same SRID. The only function that interprets SRID is ST_TRANFORM >>>>>>>>>>>>> that >>>>>>>>>>>>> allows conversion between different SRSs. >>>>>>>>>>>>> >>>>>>>>>>>>> Geography >>>>>>>>>>>>> >>>>>>>>>>>>> Geometry >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Given the choice of two types and a set of operations on top >>>>>>>>>>>>> of them, the majority of Snowflake users select the Geography >>>>>>>>>>>>> type to >>>>>>>>>>>>> represent their geospatial data. >>>>>>>>>>>>> >>>>>>>>>>>>> From our perspective, Iceberg users would benefit most from >>>>>>>>>>>>> being given the flexibility to store and process data using the >>>>>>>>>>>>> model that >>>>>>>>>>>>> better fits their needs and specific use cases. >>>>>>>>>>>>> >>>>>>>>>>>>> Therefore, we would like to ask some design clarifying >>>>>>>>>>>>> questions, important for interoperability: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> 1. In the first version of the specification Phase1 it is >>>>>>>>>>>>> mentioned as the version focused on the planar geometry model >>>>>>>>>>>>> with a CRS >>>>>>>>>>>>> system fixed on 4326. In this model, Snowflake would not be able >>>>>>>>>>>>> to map our >>>>>>>>>>>>> Geography type since it is based on the spherical Geography >>>>>>>>>>>>> model. Given >>>>>>>>>>>>> that Snowflake supports both edge types, we would like to better >>>>>>>>>>>>> understand >>>>>>>>>>>>> how to map them to the proposed Geometry type and its metadata. >>>>>>>>>>>>> >>>>>>>>>>>>> - >>>>>>>>>>>>> >>>>>>>>>>>>> How is the edge type supposed to be interpreted by the >>>>>>>>>>>>> query engine? Is it necessary for the system to adhere to the >>>>>>>>>>>>> edge model >>>>>>>>>>>>> for geospatial functions, or can it use the model that it >>>>>>>>>>>>> supports or let >>>>>>>>>>>>> the customer choose it? Will it affect the bounding box or >>>>>>>>>>>>> other row group >>>>>>>>>>>>> metadata >>>>>>>>>>>>> - >>>>>>>>>>>>> >>>>>>>>>>>>> Is there any reason why the flexible model has to be >>>>>>>>>>>>> postponed to further iterations? Would it be more extensible >>>>>>>>>>>>> to support >>>>>>>>>>>>> mutable edge type from the Phase 1, but allow systems to >>>>>>>>>>>>> ignore it if they >>>>>>>>>>>>> do not support the spherical computation model >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> 2. As you mentioned [2] in the proposal there are difficulties >>>>>>>>>>>>> with supporting the full PROJSSON specification of the SRS. From >>>>>>>>>>>>> our >>>>>>>>>>>>> experience most of the use-cases do not require the full >>>>>>>>>>>>> definition of the >>>>>>>>>>>>> SRS, in fact that definition is only needed when converting >>>>>>>>>>>>> between >>>>>>>>>>>>> coordinate systems. On the other hand, it’s often needed to check >>>>>>>>>>>>> whether >>>>>>>>>>>>> two geometry columns have the same coordinate system, for example >>>>>>>>>>>>> when >>>>>>>>>>>>> joining two columns from different data providers. >>>>>>>>>>>>> >>>>>>>>>>>>> To address this we would like to propose including the option >>>>>>>>>>>>> to specify the SRS with only a SRID in phase 1. The query engine >>>>>>>>>>>>> may choose >>>>>>>>>>>>> to treat it as opaque identified or make a look-up in the EPSG >>>>>>>>>>>>> database of >>>>>>>>>>>>> supported. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you again for driving this effort forward. We look >>>>>>>>>>>>> forward to hearing your thoughts. >>>>>>>>>>>>> >>>>>>>>>>>>> [1] >>>>>>>>>>>>> https://docs.snowflake.com/en/sql-reference/data-types-geospatial#understanding-the-differences-between-geography-and-geometry >>>>>>>>>>>>> >>>>>>>>>>>>> [2] >>>>>>>>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit#heading=h.oruaqt3nxcaf >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 2024/05/02 00:41:52 Szehon Ho wrote: >>>>>>>>>>>>> > Hi everyone, >>>>>>>>>>>>> > >>>>>>>>>>>>> > We have created a formal proposal for adding Geospatial >>>>>>>>>>>>> support to Iceberg. >>>>>>>>>>>>> > >>>>>>>>>>>>> > Please read the following for details. >>>>>>>>>>>>> > >>>>>>>>>>>>> > - Github Proposal : >>>>>>>>>>>>> https://github.com/apache/iceberg/issues/10260 >>>>>>>>>>>>> > - Proposal Doc: >>>>>>>>>>>>> > >>>>>>>>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > Note that this proposal is built on existing extensive >>>>>>>>>>>>> research and POC >>>>>>>>>>>>> > implementations (Geolake, Havasu). Special thanks to Jia Yu >>>>>>>>>>>>> and Kristin >>>>>>>>>>>>> > Cowalcijk from Wherobots/Geolake for extensive consultation >>>>>>>>>>>>> and help in >>>>>>>>>>>>> > writing this proposal, as well as support from Yuanyuan >>>>>>>>>>>>> Zhang from Geolake. >>>>>>>>>>>>> > >>>>>>>>>>>>> > We would love to get more feedback for this proposal from >>>>>>>>>>>>> the wider >>>>>>>>>>>>> > community and eventually discuss this in a community sync. >>>>>>>>>>>>> > >>>>>>>>>>>>> > Thanks >>>>>>>>>>>>> > Szehon >>>>>>>>>>>>> > >>>>>>>>>>>>> >>>>>>>>>>>>