All my concerns are addressed, I'm ready to vote.

On Mon, Sep 30, 2024 at 1:21 PM Szehon Ho <szehon.apa...@gmail.com> wrote:

> Hi all,
>
> There have been several rounds of discussion on the PR:
> https://github.com/apache/iceberg/pull/10981 and I think most of the main
> points have been addressed.
>
> If anyone is interested, please take a look.  If there are no other major
> points, we plan to start a VOTE thread soon.
>
> I know Jia and team are also volunteering to work on the prototype
> immediately afterwards.
>
> Thank you,
> Szehon
>
> On Tue, Aug 20, 2024 at 1:57 PM Szehon Ho <szehon.apa...@gmail.com> wrote:
>
>> Hi all
>>
>> Please take a look at the proposed spec change to support Geo type for V3
>> in : https://github.com/apache/iceberg/pull/10981, and comment or
>> otherwise let me know your thoughts.
>>
>> Just as an FYI it incorporated the feedback from our last meeting (with
>> Snowflake and Wherobots engineers).
>>
>> Thanks,
>> Szehon
>>
>> On Wed, Jun 26, 2024 at 7:29 PM Szehon Ho <szehon.apa...@gmail.com>
>> wrote:
>>
>>> Hi
>>>
>>> It was great to meet in person with Snowflake engineers and we had a
>>> good discussion on the paths forward.
>>>
>>> Meeting notes for Snowflake- Iceberg sync.
>>>
>>>    - Iceberg proposed Geometry type defaults to (edges=planar ,
>>>    crs=CRS84).
>>>    - Snowflake has two types Geography (spherical) and Geometry
>>>    (planar, with customizable CRS).  The data layout/encoding is the same 
>>> for
>>>    both types.  Let's see how we can support each in Iceberg type, 
>>> especially
>>>    wrt Iceberg partition/file pruning
>>>    - Geography type support
>>>    - Main concern is the need for a suitable partition transform for
>>>       partition-level filter, the candidate is Micahel Entin's proposal
>>>       
>>> <https://docs.google.com/document/d/1tG13UpdNH3i0bVkjFLsE2kXEXCuw1XRpAC2L2qCUox0/edit>
>>>       .
>>>       - Secondary concern is file and RG-level filtering.  Gang's Parquet
>>>       proposal <https://github.com/apache/parquet-format/pull/240/files> 
>>> allow
>>>       storage of S2 / H3 ID's in Parquet stats, and so we can also leverage 
>>> that
>>>       in Iceberg pruning code (Google and Uber libraries are compatible)
>>>    - Geometry type support
>>>       -  Main concern is partition transform needs to understand CRS,
>>>       but this can be solved by having XZ2 transform created with 
>>> customizable
>>>       min/max lat/long range (its all it needs)
>>>    - Should (CRS, edges) be stored properties on Geography type in
>>>    Phase 1?
>>>       - Should be fine to store, with only allowing defaults in Phase 1.
>>>       - Concern 1: If edges is stored, there will be ask to store other
>>>       properties like (orientation, epoch).  Solution is to punt these 
>>> follow-on
>>>       properties for later.
>>>       - Concern 2: if crs is stored, what format?  PROJJSON vs SRID.
>>>       Solution is to leave it as a string
>>>       - Concern 3: if crs is stored as a string, Iceberg cannot read
>>>       it.  This should be ok, as we only need this for XZ2 transform, where 
>>> the
>>>       user already passes in the info from CRS (up to user to make sure 
>>> these
>>>       align).
>>>
>>> Thanks
>>> Szehon
>>>
>>> On Tue, Jun 18, 2024 at 12:23 PM Szehon Ho <szehon.apa...@gmail.com>
>>> wrote:
>>>
>>>> Jia and I will sync with the Snowflake folks to see if we can have a
>>>> solution, or roadmap to solution, in the proposal.
>>>>
>>>> Thanks JB for the interest!  By the way, I want to schedule a meeting
>>>> to go over the proposal, it seems there's good feedback from folks from geo
>>>> side (and even Parquet community), but not too many eyes/feedback from
>>>> other folks/PMC on Iceberg community.  This might be due to lack of
>>>> familiarity/ time to read through it all.  In fact, a lot of the advanced
>>>> discussions like this one are for Phase 2 items, and Phase 1 items are
>>>> relatively straightforward, so wanted to explain that.  As I know its
>>>> summer vacation for some folks, we can do this in a week or early July,
>>>> hope that sounds good with everyone.
>>>>
>>>> Thanks,
>>>> Szehon
>>>>
>>>> On Tue, Jun 18, 2024 at 1:54 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>>>> wrote:
>>>>
>>>>> Hi Jia
>>>>>
>>>>> Thanks for the update. I'm gonna re-read the whole thread and document
>>>>> to have a better understanding.
>>>>>
>>>>> Thanks !
>>>>> Regards
>>>>> JB
>>>>>
>>>>> On Mon, Jun 17, 2024 at 7:44 PM Jia Yu <ji...@apache.org> wrote:
>>>>>
>>>>>> Hi Snowflake folks,
>>>>>>
>>>>>> Please let me know if you have other questions regarding the
>>>>>> proposal. If any, Szehon and I can set up a zoom call with you guys to
>>>>>> clarify some details. We are in the Pacific time zone. If you are in
>>>>>> Europe, maybe early morning Pacific Time works best for you?
>>>>>>
>>>>>> Thanks,
>>>>>> Jia
>>>>>>
>>>>>> On Wed, Jun 5, 2024 at 6:28 PM Gang Wu <ust...@gmail.com> wrote:
>>>>>>
>>>>>>> > The min/max stats are discussed in the doc (Phase 2), depending on
>>>>>>> the non-trivial encoding.
>>>>>>>
>>>>>>> Just want to add that min/max stats filtering could be supported by
>>>>>>> file format natively. Adding geometry type to parquet spec
>>>>>>> is under discussion:
>>>>>>> https://github.com/apache/parquet-format/pull/240
>>>>>>>
>>>>>>> Best,
>>>>>>> Gang
>>>>>>>
>>>>>>> On Thu, Jun 6, 2024 at 5:53 AM Szehon Ho <szehon.apa...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Peter
>>>>>>>>
>>>>>>>> Yes the document only concerns the predicate pushdown of geometric
>>>>>>>> column.  Predicate pushdown takes two forms, 1) partition filter and 2)
>>>>>>>> min/max stats.  The min/max stats are discussed in the doc (Phase 2),
>>>>>>>> depending on the non-trivial encoding.
>>>>>>>>
>>>>>>>> The evaluators are always AND'ed together, so I dont see any issue
>>>>>>>> of partitioning with another key not working on a table with a geo 
>>>>>>>> column.
>>>>>>>>
>>>>>>>> On another note, Jia and I thought that we may have a discussion
>>>>>>>> about Snowflake geo types in a call to drill down on some details?  
>>>>>>>> What
>>>>>>>> time zone are you folks in/ what time works better ?  I think Jia and 
>>>>>>>> I are
>>>>>>>> both in Pacific time zone.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Szehon
>>>>>>>>
>>>>>>>> On Wed, Jun 5, 2024 at 1:02 AM Peter Popov <
>>>>>>>> peter.po...@snowflake.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Szehon, hi Jia,
>>>>>>>>>
>>>>>>>>> Thank you for your replies. We now better understand the
>>>>>>>>> connection between the metadata and partitioning in this proposal.
>>>>>>>>> Supporting the Mapping 1 is a great starting point, and we would like 
>>>>>>>>> to
>>>>>>>>> work closer with you on bringing the support for spherical edges and 
>>>>>>>>> other
>>>>>>>>> coordinate systems into Iceberg geometry.
>>>>>>>>>
>>>>>>>>> We have some follow-up questions regarding the partitioning (let
>>>>>>>>> us know if it’s better to comment directly in the document): Does this
>>>>>>>>> proposal imply that XZ2 partitioning is always required? In the
>>>>>>>>> current proposal, do you see a possibility of predicate pushdown
>>>>>>>>> to rely on x/y min/max column metadata instead of a partition key? We 
>>>>>>>>> see
>>>>>>>>> use-cases where a table with a geo column can be partitioned by a 
>>>>>>>>> different
>>>>>>>>> key(e.g. date) or combination of keys. It would be great to support 
>>>>>>>>> such
>>>>>>>>> use cases from the very beginning.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>> On Thu, May 30, 2024 at 8:07 AM Jia Yu <ji...@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Dmtro,
>>>>>>>>>>
>>>>>>>>>> Thanks for your email. To add to Szehon's answer,
>>>>>>>>>>
>>>>>>>>>> 1. How to represent Snowflake Geometry and Geography type in
>>>>>>>>>> Iceberg, given the Geo Iceberg Phase 1 design:
>>>>>>>>>>
>>>>>>>>>> Answer:
>>>>>>>>>> Mapping 1 (possible): Snowflake Geometry + SRID: 4326 -> Iceberg
>>>>>>>>>> Geometry + CRS84 + edges: Planar
>>>>>>>>>> Mapping 2 (impossible): Snowflake Geography -> Iceberg Geometry +
>>>>>>>>>> CRS84 + edges: Spherical
>>>>>>>>>> Mapping 3 (impossible): Snowflake Geometry + SRID:ABCDE-> Iceberg
>>>>>>>>>> Geometry + SRID:ABCDE + edges: Planar
>>>>>>>>>>
>>>>>>>>>> As Szehon mentioned, only Mapping 1 is possible because we need
>>>>>>>>>> to support spatial query push down in Iceberg. This function relies 
>>>>>>>>>> on the
>>>>>>>>>> Iceberg partition transform, which requires a 1:1 mapping between a 
>>>>>>>>>> value
>>>>>>>>>> (point/polygon/linestring) and a partition key. That is: given any
>>>>>>>>>> precision level, a polygon must produce a single ID; and the covering
>>>>>>>>>> indicated by this single ID must fully cover the extent of the 
>>>>>>>>>> polygon.
>>>>>>>>>> Currently, only xz2 can satisfy this requirement. If the theory from
>>>>>>>>>> Michael Entin can be proven to be correct, then we can support 
>>>>>>>>>> Mapping 2 in
>>>>>>>>>> Phase 2 of Geo Iceberg.
>>>>>>>>>>
>>>>>>>>>> Regarding Mapping 3, this requires Iceberg to be able to
>>>>>>>>>> understand SRID / PROJJSON such that we will know min max X Y of the 
>>>>>>>>>> CRS
>>>>>>>>>> (@Szehon, maybe Iceberg can ask the engine to provide this 
>>>>>>>>>> information?).
>>>>>>>>>> See my answer 2.
>>>>>>>>>>
>>>>>>>>>> 2. Why choose projjson instead of SRID?
>>>>>>>>>>
>>>>>>>>>> The projjson idea was borrowed from GeoParquet because we'd like
>>>>>>>>>> to enable possible conversion between Geo Iceberg and GeoParquet. 
>>>>>>>>>> However,
>>>>>>>>>> I do understand that this is not a good idea for Iceberg since not 
>>>>>>>>>> many
>>>>>>>>>> libs can parse projjson.
>>>>>>>>>>
>>>>>>>>>> @Szehon Is there a way that we can support both SRID and PROJJSON
>>>>>>>>>> in Geo Iceberg?
>>>>>>>>>>
>>>>>>>>>> It is also worth noting that, although there are many libs that
>>>>>>>>>> can parse SRID and perform look-up in the EPSG database, the license 
>>>>>>>>>> of the
>>>>>>>>>> EPSG database is NOT compatible with the Apache Software Foundation. 
>>>>>>>>>> That
>>>>>>>>>> means: Iceberg still cannot parse / understand SRID.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Jia
>>>>>>>>>>
>>>>>>>>>> On Wed, May 29, 2024 at 11:08 AM Szehon Ho <
>>>>>>>>>> szehon.apa...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Dmytro
>>>>>>>>>>>
>>>>>>>>>>> Thank you for looking through the proposal and excited to hear
>>>>>>>>>>> from you guys!  I am not a 'geo expert' and I will definitely need 
>>>>>>>>>>> to pull
>>>>>>>>>>> in Jia Yu for some of these points.
>>>>>>>>>>>
>>>>>>>>>>> Although most calculations are done on the query engine, Iceberg
>>>>>>>>>>> reference implementations (ie, Java, Python) does have to support a 
>>>>>>>>>>> few
>>>>>>>>>>> calculations to handle filter push down:
>>>>>>>>>>>
>>>>>>>>>>>    1. push down of the proposed Geospatial transforms
>>>>>>>>>>>    ST_COVERS, ST_COVERED_BY, and ST_INTERSECTS
>>>>>>>>>>>    2. evaluation of proposed Geospatial partition transform
>>>>>>>>>>>    XZ2.  As you may have seen, this was chosen as its the only 
>>>>>>>>>>> standard one
>>>>>>>>>>>    today that solves the 'boundary object' problem, still 
>>>>>>>>>>> preserving 1-to-1
>>>>>>>>>>>    mapping of row => partition value.
>>>>>>>>>>>
>>>>>>>>>>> This is the primary rationale for choosing the values, as these
>>>>>>>>>>> were implemented in the GeoLake and Havasu projects (Iceberg forks 
>>>>>>>>>>> that
>>>>>>>>>>> sparked the proposal) based on Geometry type (edge=planar, 
>>>>>>>>>>> crs=OGC:CRS84/
>>>>>>>>>>> SRID=4326).
>>>>>>>>>>>
>>>>>>>>>>> 2. As you mentioned [2] in the proposal there are difficulties
>>>>>>>>>>>> with supporting the full PROJSSON specification of the SRS. From 
>>>>>>>>>>>> our
>>>>>>>>>>>> experience most of the use-cases do not require the full 
>>>>>>>>>>>> definition of the
>>>>>>>>>>>> SRS, in fact that definition is only needed when converting between
>>>>>>>>>>>> coordinate systems. On the other hand, it’s often needed to check 
>>>>>>>>>>>> whether
>>>>>>>>>>>> two geometry columns have the same coordinate system, for example 
>>>>>>>>>>>> when
>>>>>>>>>>>> joining two columns from different data providers.
>>>>>>>>>>>>
>>>>>>>>>>>> To address this we would like to propose including the option
>>>>>>>>>>>> to specify the SRS with only a SRID in phase 1. The query engine 
>>>>>>>>>>>> may choose
>>>>>>>>>>>> to treat it as opaque identified or make a look-up in the EPSG 
>>>>>>>>>>>> database of
>>>>>>>>>>>> supported.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The way to specify CRS definition is actually taken from
>>>>>>>>>>> GeoParquet [1], I think we are not bound to follow it if there are 
>>>>>>>>>>> better
>>>>>>>>>>> options.  I feel we might need to at least list out supported
>>>>>>>>>>> configurations in the spec, though.  There is some conversation on 
>>>>>>>>>>> the doc
>>>>>>>>>>> here about this [2].  Basically:
>>>>>>>>>>>
>>>>>>>>>>>    1. XZ2 assumes planar edges.  This is a feature of the
>>>>>>>>>>>    algorithm, based on the original paper.  A possible solution to 
>>>>>>>>>>> spherical
>>>>>>>>>>>    edge is proposed by Michael Entin here: [3], please feel free to 
>>>>>>>>>>> evaluate.
>>>>>>>>>>>    2. XZ2 needs to know the coordinate range.  According to
>>>>>>>>>>>    Jia's comments, this needs parsing of the CRS.  Can it be done 
>>>>>>>>>>> with SRID
>>>>>>>>>>>    alone?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> 1. In the first version of the specification Phase1 it is
>>>>>>>>>>>> mentioned as the version focused on the planar geometry model with 
>>>>>>>>>>>> a CRS
>>>>>>>>>>>> system fixed on 4326. In this model, Snowflake would not be able 
>>>>>>>>>>>> to map our
>>>>>>>>>>>> Geography type since it is based on the spherical Geography model. 
>>>>>>>>>>>> Given
>>>>>>>>>>>> that Snowflake supports both edge types, we would like to better 
>>>>>>>>>>>> understand
>>>>>>>>>>>> how to map them to the proposed Geometry type and its metadata.
>>>>>>>>>>>>
>>>>>>>>>>>>    -
>>>>>>>>>>>>
>>>>>>>>>>>>    How is the edge type supposed to be interpreted by the
>>>>>>>>>>>>    query engine? Is it necessary for the system to adhere to the 
>>>>>>>>>>>> edge model
>>>>>>>>>>>>    for geospatial functions, or can it use the model that it 
>>>>>>>>>>>> supports or let
>>>>>>>>>>>>    the customer choose it? Will it affect the bounding box or 
>>>>>>>>>>>> other row group
>>>>>>>>>>>>    metadata
>>>>>>>>>>>>    -
>>>>>>>>>>>>
>>>>>>>>>>>>    Is there any reason why the flexible model has to be
>>>>>>>>>>>>    postponed to further iterations? Would it be more extensible to 
>>>>>>>>>>>> support
>>>>>>>>>>>>    mutable edge type from the Phase 1, but allow systems to ignore 
>>>>>>>>>>>> it if they
>>>>>>>>>>>>    do not support the spherical computation model
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> It may be answered by the previous paragraph in regards to XZ2.
>>>>>>>>>>>
>>>>>>>>>>>    1. If we get XZ2 to work with a more variable CRS without
>>>>>>>>>>>    requiring full PROJJSON specification, it seems it is a path to 
>>>>>>>>>>> support
>>>>>>>>>>>    Snowflake Geometry type?
>>>>>>>>>>>    2. If we get another one-to-one partition function on
>>>>>>>>>>>    spherical edges, like the one proposed by Michael, it seems a 
>>>>>>>>>>> path to
>>>>>>>>>>>    support Snowflake Geography type?
>>>>>>>>>>>
>>>>>>>>>>> Does that sound correct?  As for why certain things are marked
>>>>>>>>>>> as Phase 1, they are just chosen so we can all agree on an initial 
>>>>>>>>>>> design
>>>>>>>>>>> and iterate faster and not set in stone, maybe the path 1 is 
>>>>>>>>>>> possible to do
>>>>>>>>>>> quickly, for example.
>>>>>>>>>>>
>>>>>>>>>>> Also , I am not sure about handling evaluation of ST_COVERS,
>>>>>>>>>>> ST_COVERED_BY, and ST_INTERSECTS (how easy to handle different CRS +
>>>>>>>>>>> spherical edges).  I will leave it to Jia.
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>> Szehon
>>>>>>>>>>>
>>>>>>>>>>> [1]:
>>>>>>>>>>> https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md#column-metadata
>>>>>>>>>>> [2]:
>>>>>>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit?disco=AAABL-z6xXk
>>>>>>>>>>> <https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit?disco=AAABL-z6xXk>
>>>>>>>>>>> [3]:
>>>>>>>>>>> https://docs.google.com/document/d/1tG13UpdNH3i0bVkjFLsE2kXEXCuw1XRpAC2L2qCUox0/edit
>>>>>>>>>>> <https://docs.google.com/document/d/1tG13UpdNH3i0bVkjFLsE2kXEXCuw1XRpAC2L2qCUox0/edit>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 29, 2024 at 8:30 AM Dmytro Koval
>>>>>>>>>>> <dmytro.ko...@snowflake.com.invalid> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Dear Szehon and Iceberg Community,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This is Dmytro, Peter, Aihua, and Tyler from Snowflake. As part
>>>>>>>>>>>> of our desire to be more active in the Iceberg community, we’ve 
>>>>>>>>>>>> been
>>>>>>>>>>>> looking over this geospatial proposal. We’re excited geospatial is 
>>>>>>>>>>>> getting
>>>>>>>>>>>> traction, as we see a lot of geo usage within Snowflake, and 
>>>>>>>>>>>> expect that
>>>>>>>>>>>> usage to carry over to our Iceberg offerings soon. After reviewing 
>>>>>>>>>>>> the
>>>>>>>>>>>> proposal, we have some questions we’d like to pose given our 
>>>>>>>>>>>> experience
>>>>>>>>>>>> with geospatial support in Snowflake.
>>>>>>>>>>>>
>>>>>>>>>>>> We would like to clarify two aspects of the proposal: handling
>>>>>>>>>>>> of the spherical model and definition of the spatial reference 
>>>>>>>>>>>> system. Both
>>>>>>>>>>>> of which have a big impact on the interoperability with Snowflake 
>>>>>>>>>>>> and other
>>>>>>>>>>>> query engines and Geo processing systems.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Let us first share some context about geospatial types at
>>>>>>>>>>>> Snowflake; geo experts will certainly be familiar with this context
>>>>>>>>>>>> already, but for the sake of others we want to err on the side of 
>>>>>>>>>>>> being
>>>>>>>>>>>> explicit and clear. Snowflake supports two Geospatial types [1]:
>>>>>>>>>>>> - Geography – uses a spherical approximation of the earth for
>>>>>>>>>>>> all the computations. It does not perfectly represent the earth, 
>>>>>>>>>>>> but allows
>>>>>>>>>>>> getting accurate results on WGS84 coordinates, used by GPS without 
>>>>>>>>>>>> any need
>>>>>>>>>>>> to perform coordinate system reprojections. It is also quite fast 
>>>>>>>>>>>> for
>>>>>>>>>>>> end-to-end computations. In general, it has less distortions 
>>>>>>>>>>>> compared to
>>>>>>>>>>>> the 2d planar model .
>>>>>>>>>>>> - Geometry – uses planar Euclidean geometry model. Geometric
>>>>>>>>>>>> computations are simpler, but require transforming the data between
>>>>>>>>>>>> coordinate systems to minimize the distortion. The Geometry data 
>>>>>>>>>>>> type
>>>>>>>>>>>> allows setting a spatial reference system for each row using the 
>>>>>>>>>>>> SRID. The
>>>>>>>>>>>> binary geospatial functions are only allowed on the geometries 
>>>>>>>>>>>> with the
>>>>>>>>>>>> same SRID. The only function that interprets SRID is ST_TRANFORM 
>>>>>>>>>>>> that
>>>>>>>>>>>> allows conversion between different SRSs.
>>>>>>>>>>>>
>>>>>>>>>>>> Geography
>>>>>>>>>>>>
>>>>>>>>>>>> Geometry
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Given the choice of two types and a set of operations on top of
>>>>>>>>>>>> them, the majority of Snowflake users select the Geography type to
>>>>>>>>>>>> represent their geospatial data.
>>>>>>>>>>>>
>>>>>>>>>>>> From our perspective, Iceberg users would benefit most from
>>>>>>>>>>>> being given the flexibility to store and process data using the 
>>>>>>>>>>>> model that
>>>>>>>>>>>> better fits their needs and specific use cases.
>>>>>>>>>>>>
>>>>>>>>>>>> Therefore, we would like to ask some design clarifying
>>>>>>>>>>>> questions, important for interoperability:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 1. In the first version of the specification Phase1 it is
>>>>>>>>>>>> mentioned as the version focused on the planar geometry model with 
>>>>>>>>>>>> a CRS
>>>>>>>>>>>> system fixed on 4326. In this model, Snowflake would not be able 
>>>>>>>>>>>> to map our
>>>>>>>>>>>> Geography type since it is based on the spherical Geography model. 
>>>>>>>>>>>> Given
>>>>>>>>>>>> that Snowflake supports both edge types, we would like to better 
>>>>>>>>>>>> understand
>>>>>>>>>>>> how to map them to the proposed Geometry type and its metadata.
>>>>>>>>>>>>
>>>>>>>>>>>>    -
>>>>>>>>>>>>
>>>>>>>>>>>>    How is the edge type supposed to be interpreted by the
>>>>>>>>>>>>    query engine? Is it necessary for the system to adhere to the 
>>>>>>>>>>>> edge model
>>>>>>>>>>>>    for geospatial functions, or can it use the model that it 
>>>>>>>>>>>> supports or let
>>>>>>>>>>>>    the customer choose it? Will it affect the bounding box or 
>>>>>>>>>>>> other row group
>>>>>>>>>>>>    metadata
>>>>>>>>>>>>    -
>>>>>>>>>>>>
>>>>>>>>>>>>    Is there any reason why the flexible model has to be
>>>>>>>>>>>>    postponed to further iterations? Would it be more extensible to 
>>>>>>>>>>>> support
>>>>>>>>>>>>    mutable edge type from the Phase 1, but allow systems to ignore 
>>>>>>>>>>>> it if they
>>>>>>>>>>>>    do not support the spherical computation model
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2. As you mentioned [2] in the proposal there are difficulties
>>>>>>>>>>>> with supporting the full PROJSSON specification of the SRS. From 
>>>>>>>>>>>> our
>>>>>>>>>>>> experience most of the use-cases do not require the full 
>>>>>>>>>>>> definition of the
>>>>>>>>>>>> SRS, in fact that definition is only needed when converting between
>>>>>>>>>>>> coordinate systems. On the other hand, it’s often needed to check 
>>>>>>>>>>>> whether
>>>>>>>>>>>> two geometry columns have the same coordinate system, for example 
>>>>>>>>>>>> when
>>>>>>>>>>>> joining two columns from different data providers.
>>>>>>>>>>>>
>>>>>>>>>>>> To address this we would like to propose including the option
>>>>>>>>>>>> to specify the SRS with only a SRID in phase 1. The query engine 
>>>>>>>>>>>> may choose
>>>>>>>>>>>> to treat it as opaque identified or make a look-up in the EPSG 
>>>>>>>>>>>> database of
>>>>>>>>>>>> supported.
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you again for driving this effort forward. We look
>>>>>>>>>>>> forward to hearing your thoughts.
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>> https://docs.snowflake.com/en/sql-reference/data-types-geospatial#understanding-the-differences-between-geography-and-geometry
>>>>>>>>>>>>
>>>>>>>>>>>> [2]
>>>>>>>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit#heading=h.oruaqt3nxcaf
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 2024/05/02 00:41:52 Szehon Ho wrote:
>>>>>>>>>>>> > Hi everyone,
>>>>>>>>>>>> >
>>>>>>>>>>>> > We have created a formal proposal for adding Geospatial
>>>>>>>>>>>> support to Iceberg.
>>>>>>>>>>>> >
>>>>>>>>>>>> > Please read the following for details.
>>>>>>>>>>>> >
>>>>>>>>>>>> >    - Github Proposal :
>>>>>>>>>>>> https://github.com/apache/iceberg/issues/10260
>>>>>>>>>>>> >    - Proposal Doc:
>>>>>>>>>>>> >
>>>>>>>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> > Note that this proposal is built on existing extensive
>>>>>>>>>>>> research and POC
>>>>>>>>>>>> > implementations (Geolake, Havasu).  Special thanks to Jia Yu
>>>>>>>>>>>> and Kristin
>>>>>>>>>>>> > Cowalcijk from Wherobots/Geolake for extensive consultation
>>>>>>>>>>>> and help in
>>>>>>>>>>>> > writing this proposal, as well as support from Yuanyuan Zhang
>>>>>>>>>>>> from Geolake.
>>>>>>>>>>>> >
>>>>>>>>>>>> > We would love to get more feedback for this proposal from the
>>>>>>>>>>>> wider
>>>>>>>>>>>> > community and eventually discuss this in a community sync.
>>>>>>>>>>>> >
>>>>>>>>>>>> > Thanks
>>>>>>>>>>>> > Szehon
>>>>>>>>>>>> >
>>>>>>>>>>>>
>>>>>>>>>>>

Reply via email to