Thanks Szehon! My comments were addressed. I'm ready to vote.

Yufei


On Mon, Sep 30, 2024 at 11:47 AM Russell Spitzer <russell.spit...@gmail.com>
wrote:

> All my concerns are addressed, I'm ready to vote.
>
> On Mon, Sep 30, 2024 at 1:21 PM Szehon Ho <szehon.apa...@gmail.com> wrote:
>
>> Hi all,
>>
>> There have been several rounds of discussion on the PR:
>> https://github.com/apache/iceberg/pull/10981 and I think most of the
>> main points have been addressed.
>>
>> If anyone is interested, please take a look.  If there are no other major
>> points, we plan to start a VOTE thread soon.
>>
>> I know Jia and team are also volunteering to work on the prototype
>> immediately afterwards.
>>
>> Thank you,
>> Szehon
>>
>> On Tue, Aug 20, 2024 at 1:57 PM Szehon Ho <szehon.apa...@gmail.com>
>> wrote:
>>
>>> Hi all
>>>
>>> Please take a look at the proposed spec change to support Geo type for
>>> V3 in : https://github.com/apache/iceberg/pull/10981, and comment or
>>> otherwise let me know your thoughts.
>>>
>>> Just as an FYI it incorporated the feedback from our last meeting (with
>>> Snowflake and Wherobots engineers).
>>>
>>> Thanks,
>>> Szehon
>>>
>>> On Wed, Jun 26, 2024 at 7:29 PM Szehon Ho <szehon.apa...@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> It was great to meet in person with Snowflake engineers and we had a
>>>> good discussion on the paths forward.
>>>>
>>>> Meeting notes for Snowflake- Iceberg sync.
>>>>
>>>>    - Iceberg proposed Geometry type defaults to (edges=planar ,
>>>>    crs=CRS84).
>>>>    - Snowflake has two types Geography (spherical) and Geometry
>>>>    (planar, with customizable CRS).  The data layout/encoding is the same 
>>>> for
>>>>    both types.  Let's see how we can support each in Iceberg type, 
>>>> especially
>>>>    wrt Iceberg partition/file pruning
>>>>    - Geography type support
>>>>    - Main concern is the need for a suitable partition transform for
>>>>       partition-level filter, the candidate is Micahel Entin's proposal
>>>>       
>>>> <https://docs.google.com/document/d/1tG13UpdNH3i0bVkjFLsE2kXEXCuw1XRpAC2L2qCUox0/edit>
>>>>       .
>>>>       - Secondary concern is file and RG-level filtering.  Gang's Parquet
>>>>       proposal
>>>>       <https://github.com/apache/parquet-format/pull/240/files> allow
>>>>       storage of S2 / H3 ID's in Parquet stats, and so we can also 
>>>> leverage that
>>>>       in Iceberg pruning code (Google and Uber libraries are compatible)
>>>>    - Geometry type support
>>>>       -  Main concern is partition transform needs to understand CRS,
>>>>       but this can be solved by having XZ2 transform created with 
>>>> customizable
>>>>       min/max lat/long range (its all it needs)
>>>>    - Should (CRS, edges) be stored properties on Geography type in
>>>>    Phase 1?
>>>>       - Should be fine to store, with only allowing defaults in Phase
>>>>       1.
>>>>       - Concern 1: If edges is stored, there will be ask to store
>>>>       other properties like (orientation, epoch).  Solution is to punt 
>>>> these
>>>>       follow-on properties for later.
>>>>       - Concern 2: if crs is stored, what format?  PROJJSON vs SRID.
>>>>       Solution is to leave it as a string
>>>>       - Concern 3: if crs is stored as a string, Iceberg cannot read
>>>>       it.  This should be ok, as we only need this for XZ2 transform, 
>>>> where the
>>>>       user already passes in the info from CRS (up to user to make sure 
>>>> these
>>>>       align).
>>>>
>>>> Thanks
>>>> Szehon
>>>>
>>>> On Tue, Jun 18, 2024 at 12:23 PM Szehon Ho <szehon.apa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Jia and I will sync with the Snowflake folks to see if we can have a
>>>>> solution, or roadmap to solution, in the proposal.
>>>>>
>>>>> Thanks JB for the interest!  By the way, I want to schedule a meeting
>>>>> to go over the proposal, it seems there's good feedback from folks from 
>>>>> geo
>>>>> side (and even Parquet community), but not too many eyes/feedback from
>>>>> other folks/PMC on Iceberg community.  This might be due to lack of
>>>>> familiarity/ time to read through it all.  In fact, a lot of the advanced
>>>>> discussions like this one are for Phase 2 items, and Phase 1 items are
>>>>> relatively straightforward, so wanted to explain that.  As I know its
>>>>> summer vacation for some folks, we can do this in a week or early July,
>>>>> hope that sounds good with everyone.
>>>>>
>>>>> Thanks,
>>>>> Szehon
>>>>>
>>>>> On Tue, Jun 18, 2024 at 1:54 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>>>>> wrote:
>>>>>
>>>>>> Hi Jia
>>>>>>
>>>>>> Thanks for the update. I'm gonna re-read the whole thread and
>>>>>> document to have a better understanding.
>>>>>>
>>>>>> Thanks !
>>>>>> Regards
>>>>>> JB
>>>>>>
>>>>>> On Mon, Jun 17, 2024 at 7:44 PM Jia Yu <ji...@apache.org> wrote:
>>>>>>
>>>>>>> Hi Snowflake folks,
>>>>>>>
>>>>>>> Please let me know if you have other questions regarding the
>>>>>>> proposal. If any, Szehon and I can set up a zoom call with you guys to
>>>>>>> clarify some details. We are in the Pacific time zone. If you are in
>>>>>>> Europe, maybe early morning Pacific Time works best for you?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jia
>>>>>>>
>>>>>>> On Wed, Jun 5, 2024 at 6:28 PM Gang Wu <ust...@gmail.com> wrote:
>>>>>>>
>>>>>>>> > The min/max stats are discussed in the doc (Phase 2), depending
>>>>>>>> on the non-trivial encoding.
>>>>>>>>
>>>>>>>> Just want to add that min/max stats filtering could be supported by
>>>>>>>> file format natively. Adding geometry type to parquet spec
>>>>>>>> is under discussion:
>>>>>>>> https://github.com/apache/parquet-format/pull/240
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Gang
>>>>>>>>
>>>>>>>> On Thu, Jun 6, 2024 at 5:53 AM Szehon Ho <szehon.apa...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Peter
>>>>>>>>>
>>>>>>>>> Yes the document only concerns the predicate pushdown of geometric
>>>>>>>>> column.  Predicate pushdown takes two forms, 1) partition filter and 
>>>>>>>>> 2)
>>>>>>>>> min/max stats.  The min/max stats are discussed in the doc (Phase 2),
>>>>>>>>> depending on the non-trivial encoding.
>>>>>>>>>
>>>>>>>>> The evaluators are always AND'ed together, so I dont see any issue
>>>>>>>>> of partitioning with another key not working on a table with a geo 
>>>>>>>>> column.
>>>>>>>>>
>>>>>>>>> On another note, Jia and I thought that we may have a discussion
>>>>>>>>> about Snowflake geo types in a call to drill down on some details?  
>>>>>>>>> What
>>>>>>>>> time zone are you folks in/ what time works better ?  I think Jia and 
>>>>>>>>> I are
>>>>>>>>> both in Pacific time zone.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Szehon
>>>>>>>>>
>>>>>>>>> On Wed, Jun 5, 2024 at 1:02 AM Peter Popov <
>>>>>>>>> peter.po...@snowflake.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Szehon, hi Jia,
>>>>>>>>>>
>>>>>>>>>> Thank you for your replies. We now better understand the
>>>>>>>>>> connection between the metadata and partitioning in this proposal.
>>>>>>>>>> Supporting the Mapping 1 is a great starting point, and we would 
>>>>>>>>>> like to
>>>>>>>>>> work closer with you on bringing the support for spherical edges and 
>>>>>>>>>> other
>>>>>>>>>> coordinate systems into Iceberg geometry.
>>>>>>>>>>
>>>>>>>>>> We have some follow-up questions regarding the partitioning (let
>>>>>>>>>> us know if it’s better to comment directly in the document): Does 
>>>>>>>>>> this
>>>>>>>>>> proposal imply that XZ2 partitioning is always required? In the
>>>>>>>>>> current proposal, do you see a possibility of predicate pushdown
>>>>>>>>>> to rely on x/y min/max column metadata instead of a partition key? 
>>>>>>>>>> We see
>>>>>>>>>> use-cases where a table with a geo column can be partitioned by a 
>>>>>>>>>> different
>>>>>>>>>> key(e.g. date) or combination of keys. It would be great to support 
>>>>>>>>>> such
>>>>>>>>>> use cases from the very beginning.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>> On Thu, May 30, 2024 at 8:07 AM Jia Yu <ji...@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Dmtro,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for your email. To add to Szehon's answer,
>>>>>>>>>>>
>>>>>>>>>>> 1. How to represent Snowflake Geometry and Geography type in
>>>>>>>>>>> Iceberg, given the Geo Iceberg Phase 1 design:
>>>>>>>>>>>
>>>>>>>>>>> Answer:
>>>>>>>>>>> Mapping 1 (possible): Snowflake Geometry + SRID: 4326 -> Iceberg
>>>>>>>>>>> Geometry + CRS84 + edges: Planar
>>>>>>>>>>> Mapping 2 (impossible): Snowflake Geography -> Iceberg
>>>>>>>>>>> Geometry + CRS84 + edges: Spherical
>>>>>>>>>>> Mapping 3 (impossible): Snowflake Geometry + SRID:ABCDE->
>>>>>>>>>>> Iceberg Geometry + SRID:ABCDE + edges: Planar
>>>>>>>>>>>
>>>>>>>>>>> As Szehon mentioned, only Mapping 1 is possible because we need
>>>>>>>>>>> to support spatial query push down in Iceberg. This function relies 
>>>>>>>>>>> on the
>>>>>>>>>>> Iceberg partition transform, which requires a 1:1 mapping between a 
>>>>>>>>>>> value
>>>>>>>>>>> (point/polygon/linestring) and a partition key. That is: given any
>>>>>>>>>>> precision level, a polygon must produce a single ID; and the 
>>>>>>>>>>> covering
>>>>>>>>>>> indicated by this single ID must fully cover the extent of the 
>>>>>>>>>>> polygon.
>>>>>>>>>>> Currently, only xz2 can satisfy this requirement. If the theory from
>>>>>>>>>>> Michael Entin can be proven to be correct, then we can support 
>>>>>>>>>>> Mapping 2 in
>>>>>>>>>>> Phase 2 of Geo Iceberg.
>>>>>>>>>>>
>>>>>>>>>>> Regarding Mapping 3, this requires Iceberg to be able to
>>>>>>>>>>> understand SRID / PROJJSON such that we will know min max X Y of 
>>>>>>>>>>> the CRS
>>>>>>>>>>> (@Szehon, maybe Iceberg can ask the engine to provide this 
>>>>>>>>>>> information?).
>>>>>>>>>>> See my answer 2.
>>>>>>>>>>>
>>>>>>>>>>> 2. Why choose projjson instead of SRID?
>>>>>>>>>>>
>>>>>>>>>>> The projjson idea was borrowed from GeoParquet because we'd like
>>>>>>>>>>> to enable possible conversion between Geo Iceberg and GeoParquet. 
>>>>>>>>>>> However,
>>>>>>>>>>> I do understand that this is not a good idea for Iceberg since not 
>>>>>>>>>>> many
>>>>>>>>>>> libs can parse projjson.
>>>>>>>>>>>
>>>>>>>>>>> @Szehon Is there a way that we can support both SRID and
>>>>>>>>>>> PROJJSON in Geo Iceberg?
>>>>>>>>>>>
>>>>>>>>>>> It is also worth noting that, although there are many libs that
>>>>>>>>>>> can parse SRID and perform look-up in the EPSG database, the 
>>>>>>>>>>> license of the
>>>>>>>>>>> EPSG database is NOT compatible with the Apache Software 
>>>>>>>>>>> Foundation. That
>>>>>>>>>>> means: Iceberg still cannot parse / understand SRID.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Jia
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 29, 2024 at 11:08 AM Szehon Ho <
>>>>>>>>>>> szehon.apa...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Dmytro
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you for looking through the proposal and excited to hear
>>>>>>>>>>>> from you guys!  I am not a 'geo expert' and I will definitely need 
>>>>>>>>>>>> to pull
>>>>>>>>>>>> in Jia Yu for some of these points.
>>>>>>>>>>>>
>>>>>>>>>>>> Although most calculations are done on the query engine,
>>>>>>>>>>>> Iceberg reference implementations (ie, Java, Python) does have to 
>>>>>>>>>>>> support a
>>>>>>>>>>>> few calculations to handle filter push down:
>>>>>>>>>>>>
>>>>>>>>>>>>    1. push down of the proposed Geospatial transforms
>>>>>>>>>>>>    ST_COVERS, ST_COVERED_BY, and ST_INTERSECTS
>>>>>>>>>>>>    2. evaluation of proposed Geospatial partition transform
>>>>>>>>>>>>    XZ2.  As you may have seen, this was chosen as its the only 
>>>>>>>>>>>> standard one
>>>>>>>>>>>>    today that solves the 'boundary object' problem, still 
>>>>>>>>>>>> preserving 1-to-1
>>>>>>>>>>>>    mapping of row => partition value.
>>>>>>>>>>>>
>>>>>>>>>>>> This is the primary rationale for choosing the values, as these
>>>>>>>>>>>> were implemented in the GeoLake and Havasu projects (Iceberg forks 
>>>>>>>>>>>> that
>>>>>>>>>>>> sparked the proposal) based on Geometry type (edge=planar, 
>>>>>>>>>>>> crs=OGC:CRS84/
>>>>>>>>>>>> SRID=4326).
>>>>>>>>>>>>
>>>>>>>>>>>> 2. As you mentioned [2] in the proposal there are difficulties
>>>>>>>>>>>>> with supporting the full PROJSSON specification of the SRS. From 
>>>>>>>>>>>>> our
>>>>>>>>>>>>> experience most of the use-cases do not require the full 
>>>>>>>>>>>>> definition of the
>>>>>>>>>>>>> SRS, in fact that definition is only needed when converting 
>>>>>>>>>>>>> between
>>>>>>>>>>>>> coordinate systems. On the other hand, it’s often needed to check 
>>>>>>>>>>>>> whether
>>>>>>>>>>>>> two geometry columns have the same coordinate system, for example 
>>>>>>>>>>>>> when
>>>>>>>>>>>>> joining two columns from different data providers.
>>>>>>>>>>>>>
>>>>>>>>>>>>> To address this we would like to propose including the option
>>>>>>>>>>>>> to specify the SRS with only a SRID in phase 1. The query engine 
>>>>>>>>>>>>> may choose
>>>>>>>>>>>>> to treat it as opaque identified or make a look-up in the EPSG 
>>>>>>>>>>>>> database of
>>>>>>>>>>>>> supported.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The way to specify CRS definition is actually taken from
>>>>>>>>>>>> GeoParquet [1], I think we are not bound to follow it if there are 
>>>>>>>>>>>> better
>>>>>>>>>>>> options.  I feel we might need to at least list out supported
>>>>>>>>>>>> configurations in the spec, though.  There is some conversation on 
>>>>>>>>>>>> the doc
>>>>>>>>>>>> here about this [2].  Basically:
>>>>>>>>>>>>
>>>>>>>>>>>>    1. XZ2 assumes planar edges.  This is a feature of the
>>>>>>>>>>>>    algorithm, based on the original paper.  A possible solution to 
>>>>>>>>>>>> spherical
>>>>>>>>>>>>    edge is proposed by Michael Entin here: [3], please feel free 
>>>>>>>>>>>> to evaluate.
>>>>>>>>>>>>    2. XZ2 needs to know the coordinate range.  According to
>>>>>>>>>>>>    Jia's comments, this needs parsing of the CRS.  Can it be done 
>>>>>>>>>>>> with SRID
>>>>>>>>>>>>    alone?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> 1. In the first version of the specification Phase1 it is
>>>>>>>>>>>>> mentioned as the version focused on the planar geometry model 
>>>>>>>>>>>>> with a CRS
>>>>>>>>>>>>> system fixed on 4326. In this model, Snowflake would not be able 
>>>>>>>>>>>>> to map our
>>>>>>>>>>>>> Geography type since it is based on the spherical Geography 
>>>>>>>>>>>>> model. Given
>>>>>>>>>>>>> that Snowflake supports both edge types, we would like to better 
>>>>>>>>>>>>> understand
>>>>>>>>>>>>> how to map them to the proposed Geometry type and its metadata.
>>>>>>>>>>>>>
>>>>>>>>>>>>>    -
>>>>>>>>>>>>>
>>>>>>>>>>>>>    How is the edge type supposed to be interpreted by the
>>>>>>>>>>>>>    query engine? Is it necessary for the system to adhere to the 
>>>>>>>>>>>>> edge model
>>>>>>>>>>>>>    for geospatial functions, or can it use the model that it 
>>>>>>>>>>>>> supports or let
>>>>>>>>>>>>>    the customer choose it? Will it affect the bounding box or 
>>>>>>>>>>>>> other row group
>>>>>>>>>>>>>    metadata
>>>>>>>>>>>>>    -
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Is there any reason why the flexible model has to be
>>>>>>>>>>>>>    postponed to further iterations? Would it be more extensible 
>>>>>>>>>>>>> to support
>>>>>>>>>>>>>    mutable edge type from the Phase 1, but allow systems to 
>>>>>>>>>>>>> ignore it if they
>>>>>>>>>>>>>    do not support the spherical computation model
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> It may be answered by the previous paragraph in regards to XZ2.
>>>>>>>>>>>>
>>>>>>>>>>>>    1. If we get XZ2 to work with a more variable CRS without
>>>>>>>>>>>>    requiring full PROJJSON specification, it seems it is a path to 
>>>>>>>>>>>> support
>>>>>>>>>>>>    Snowflake Geometry type?
>>>>>>>>>>>>    2. If we get another one-to-one partition function on
>>>>>>>>>>>>    spherical edges, like the one proposed by Michael, it seems a 
>>>>>>>>>>>> path to
>>>>>>>>>>>>    support Snowflake Geography type?
>>>>>>>>>>>>
>>>>>>>>>>>> Does that sound correct?  As for why certain things are marked
>>>>>>>>>>>> as Phase 1, they are just chosen so we can all agree on an initial 
>>>>>>>>>>>> design
>>>>>>>>>>>> and iterate faster and not set in stone, maybe the path 1 is 
>>>>>>>>>>>> possible to do
>>>>>>>>>>>> quickly, for example.
>>>>>>>>>>>>
>>>>>>>>>>>> Also , I am not sure about handling evaluation of ST_COVERS,
>>>>>>>>>>>> ST_COVERED_BY, and ST_INTERSECTS (how easy to handle different CRS 
>>>>>>>>>>>> +
>>>>>>>>>>>> spherical edges).  I will leave it to Jia.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> Szehon
>>>>>>>>>>>>
>>>>>>>>>>>> [1]:
>>>>>>>>>>>> https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md#column-metadata
>>>>>>>>>>>> [2]:
>>>>>>>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit?disco=AAABL-z6xXk
>>>>>>>>>>>> <https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit?disco=AAABL-z6xXk>
>>>>>>>>>>>> [3]:
>>>>>>>>>>>> https://docs.google.com/document/d/1tG13UpdNH3i0bVkjFLsE2kXEXCuw1XRpAC2L2qCUox0/edit
>>>>>>>>>>>> <https://docs.google.com/document/d/1tG13UpdNH3i0bVkjFLsE2kXEXCuw1XRpAC2L2qCUox0/edit>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, May 29, 2024 at 8:30 AM Dmytro Koval
>>>>>>>>>>>> <dmytro.ko...@snowflake.com.invalid> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Dear Szehon and Iceberg Community,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is Dmytro, Peter, Aihua, and Tyler from Snowflake. As
>>>>>>>>>>>>> part of our desire to be more active in the Iceberg community, 
>>>>>>>>>>>>> we’ve been
>>>>>>>>>>>>> looking over this geospatial proposal. We’re excited geospatial 
>>>>>>>>>>>>> is getting
>>>>>>>>>>>>> traction, as we see a lot of geo usage within Snowflake, and 
>>>>>>>>>>>>> expect that
>>>>>>>>>>>>> usage to carry over to our Iceberg offerings soon. After 
>>>>>>>>>>>>> reviewing the
>>>>>>>>>>>>> proposal, we have some questions we’d like to pose given our 
>>>>>>>>>>>>> experience
>>>>>>>>>>>>> with geospatial support in Snowflake.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We would like to clarify two aspects of the proposal: handling
>>>>>>>>>>>>> of the spherical model and definition of the spatial reference 
>>>>>>>>>>>>> system. Both
>>>>>>>>>>>>> of which have a big impact on the interoperability with Snowflake 
>>>>>>>>>>>>> and other
>>>>>>>>>>>>> query engines and Geo processing systems.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Let us first share some context about geospatial types at
>>>>>>>>>>>>> Snowflake; geo experts will certainly be familiar with this 
>>>>>>>>>>>>> context
>>>>>>>>>>>>> already, but for the sake of others we want to err on the side of 
>>>>>>>>>>>>> being
>>>>>>>>>>>>> explicit and clear. Snowflake supports two Geospatial types [1]:
>>>>>>>>>>>>> - Geography – uses a spherical approximation of the earth for
>>>>>>>>>>>>> all the computations. It does not perfectly represent the earth, 
>>>>>>>>>>>>> but allows
>>>>>>>>>>>>> getting accurate results on WGS84 coordinates, used by GPS 
>>>>>>>>>>>>> without any need
>>>>>>>>>>>>> to perform coordinate system reprojections. It is also quite fast 
>>>>>>>>>>>>> for
>>>>>>>>>>>>> end-to-end computations. In general, it has less distortions 
>>>>>>>>>>>>> compared to
>>>>>>>>>>>>> the 2d planar model .
>>>>>>>>>>>>> - Geometry – uses planar Euclidean geometry model. Geometric
>>>>>>>>>>>>> computations are simpler, but require transforming the data 
>>>>>>>>>>>>> between
>>>>>>>>>>>>> coordinate systems to minimize the distortion. The Geometry data 
>>>>>>>>>>>>> type
>>>>>>>>>>>>> allows setting a spatial reference system for each row using the 
>>>>>>>>>>>>> SRID. The
>>>>>>>>>>>>> binary geospatial functions are only allowed on the geometries 
>>>>>>>>>>>>> with the
>>>>>>>>>>>>> same SRID. The only function that interprets SRID is ST_TRANFORM 
>>>>>>>>>>>>> that
>>>>>>>>>>>>> allows conversion between different SRSs.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Geography
>>>>>>>>>>>>>
>>>>>>>>>>>>> Geometry
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Given the choice of two types and a set of operations on top
>>>>>>>>>>>>> of them, the majority of Snowflake users select the Geography 
>>>>>>>>>>>>> type to
>>>>>>>>>>>>> represent their geospatial data.
>>>>>>>>>>>>>
>>>>>>>>>>>>> From our perspective, Iceberg users would benefit most from
>>>>>>>>>>>>> being given the flexibility to store and process data using the 
>>>>>>>>>>>>> model that
>>>>>>>>>>>>> better fits their needs and specific use cases.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Therefore, we would like to ask some design clarifying
>>>>>>>>>>>>> questions, important for interoperability:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. In the first version of the specification Phase1 it is
>>>>>>>>>>>>> mentioned as the version focused on the planar geometry model 
>>>>>>>>>>>>> with a CRS
>>>>>>>>>>>>> system fixed on 4326. In this model, Snowflake would not be able 
>>>>>>>>>>>>> to map our
>>>>>>>>>>>>> Geography type since it is based on the spherical Geography 
>>>>>>>>>>>>> model. Given
>>>>>>>>>>>>> that Snowflake supports both edge types, we would like to better 
>>>>>>>>>>>>> understand
>>>>>>>>>>>>> how to map them to the proposed Geometry type and its metadata.
>>>>>>>>>>>>>
>>>>>>>>>>>>>    -
>>>>>>>>>>>>>
>>>>>>>>>>>>>    How is the edge type supposed to be interpreted by the
>>>>>>>>>>>>>    query engine? Is it necessary for the system to adhere to the 
>>>>>>>>>>>>> edge model
>>>>>>>>>>>>>    for geospatial functions, or can it use the model that it 
>>>>>>>>>>>>> supports or let
>>>>>>>>>>>>>    the customer choose it? Will it affect the bounding box or 
>>>>>>>>>>>>> other row group
>>>>>>>>>>>>>    metadata
>>>>>>>>>>>>>    -
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Is there any reason why the flexible model has to be
>>>>>>>>>>>>>    postponed to further iterations? Would it be more extensible 
>>>>>>>>>>>>> to support
>>>>>>>>>>>>>    mutable edge type from the Phase 1, but allow systems to 
>>>>>>>>>>>>> ignore it if they
>>>>>>>>>>>>>    do not support the spherical computation model
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2. As you mentioned [2] in the proposal there are difficulties
>>>>>>>>>>>>> with supporting the full PROJSSON specification of the SRS. From 
>>>>>>>>>>>>> our
>>>>>>>>>>>>> experience most of the use-cases do not require the full 
>>>>>>>>>>>>> definition of the
>>>>>>>>>>>>> SRS, in fact that definition is only needed when converting 
>>>>>>>>>>>>> between
>>>>>>>>>>>>> coordinate systems. On the other hand, it’s often needed to check 
>>>>>>>>>>>>> whether
>>>>>>>>>>>>> two geometry columns have the same coordinate system, for example 
>>>>>>>>>>>>> when
>>>>>>>>>>>>> joining two columns from different data providers.
>>>>>>>>>>>>>
>>>>>>>>>>>>> To address this we would like to propose including the option
>>>>>>>>>>>>> to specify the SRS with only a SRID in phase 1. The query engine 
>>>>>>>>>>>>> may choose
>>>>>>>>>>>>> to treat it as opaque identified or make a look-up in the EPSG 
>>>>>>>>>>>>> database of
>>>>>>>>>>>>> supported.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you again for driving this effort forward. We look
>>>>>>>>>>>>> forward to hearing your thoughts.
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> https://docs.snowflake.com/en/sql-reference/data-types-geospatial#understanding-the-differences-between-geography-and-geometry
>>>>>>>>>>>>>
>>>>>>>>>>>>> [2]
>>>>>>>>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit#heading=h.oruaqt3nxcaf
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2024/05/02 00:41:52 Szehon Ho wrote:
>>>>>>>>>>>>> > Hi everyone,
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > We have created a formal proposal for adding Geospatial
>>>>>>>>>>>>> support to Iceberg.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Please read the following for details.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >    - Github Proposal :
>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/10260
>>>>>>>>>>>>> >    - Proposal Doc:
>>>>>>>>>>>>> >
>>>>>>>>>>>>> https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI
>>>>>>>>>>>>> >
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Note that this proposal is built on existing extensive
>>>>>>>>>>>>> research and POC
>>>>>>>>>>>>> > implementations (Geolake, Havasu).  Special thanks to Jia Yu
>>>>>>>>>>>>> and Kristin
>>>>>>>>>>>>> > Cowalcijk from Wherobots/Geolake for extensive consultation
>>>>>>>>>>>>> and help in
>>>>>>>>>>>>> > writing this proposal, as well as support from Yuanyuan
>>>>>>>>>>>>> Zhang from Geolake.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > We would love to get more feedback for this proposal from
>>>>>>>>>>>>> the wider
>>>>>>>>>>>>> > community and eventually discuss this in a community sync.
>>>>>>>>>>>>> >
>>>>>>>>>>>>> > Thanks
>>>>>>>>>>>>> > Szehon
>>>>>>>>>>>>> >
>>>>>>>>>>>>>
>>>>>>>>>>>>

Reply via email to