Dear Szehon and Iceberg Community,

This is Dmytro, Peter, Aihua, and Tyler from Snowflake. As part of our
desire to be more active in the Iceberg community, we’ve been looking over
this geospatial proposal. We’re excited geospatial is getting traction, as
we see a lot of geo usage within Snowflake, and expect that usage to carry
over to our Iceberg offerings soon. After reviewing the proposal, we have
some questions we’d like to pose given our experience with geospatial
support in Snowflake.

We would like to clarify two aspects of the proposal: handling of the
spherical model and definition of the spatial reference system. Both of
which have a big impact on the interoperability with Snowflake and other
query engines and Geo processing systems.


Let us first share some context about geospatial types at Snowflake; geo
experts will certainly be familiar with this context already, but for the
sake of others we want to err on the side of being explicit and clear.
Snowflake supports two Geospatial types [1]:
- Geography – uses a spherical approximation of the earth for all the
computations. It does not perfectly represent the earth, but allows getting
accurate results on WGS84 coordinates, used by GPS without any need to
perform coordinate system reprojections. It is also quite fast for
end-to-end computations. In general, it has less distortions compared to
the 2d planar model .
- Geometry – uses planar Euclidean geometry model. Geometric computations
are simpler, but require transforming the data between coordinate systems
to minimize the distortion. The Geometry data type allows setting a spatial
reference system for each row using the SRID. The binary geospatial
functions are only allowed on the geometries with the same SRID. The only
function that interprets SRID is ST_TRANFORM that allows conversion between
different SRSs.

Geography

Geometry



Given the choice of two types and a set of operations on top of them, the
majority of Snowflake users select the Geography type to represent their
geospatial data.

>From our perspective, Iceberg users would benefit most from being given the
flexibility to store and process data using the model that better fits
their needs and specific use cases.

Therefore, we would like to ask some design clarifying questions, important
for interoperability:


1. In the first version of the specification Phase1 it is mentioned as the
version focused on the planar geometry model with a CRS system fixed on
4326. In this model, Snowflake would not be able to map our Geography type
since it is based on the spherical Geography model. Given that Snowflake
supports both edge types, we would like to better understand how to map
them to the proposed Geometry type and its metadata.

   -

   How is the edge type supposed to be interpreted by the query engine? Is
   it necessary for the system to adhere to the edge model for geospatial
   functions, or can it use the model that it supports or let the customer
   choose it? Will it affect the bounding box or other row group metadata
   -

   Is there any reason why the flexible model has to be postponed to
   further iterations? Would it be more extensible to support mutable edge
   type from the Phase 1, but allow systems to ignore it if they do not
   support the spherical computation model



2. As you mentioned [2] in the proposal there are difficulties with
supporting the full PROJSSON specification of the SRS. From our experience
most of the use-cases do not require the full definition of the SRS, in
fact that definition is only needed when converting between coordinate
systems. On the other hand, it’s often needed to check whether two geometry
columns have the same coordinate system, for example when joining two
columns from different data providers.

To address this we would like to propose including the option to specify
the SRS with only a SRID in phase 1. The query engine may choose to treat
it as opaque identified or make a look-up in the EPSG database of
supported.

Thank you again for driving this effort forward. We look forward to hearing
your thoughts.

[1]
https://docs.snowflake.com/en/sql-reference/data-types-geospatial#understanding-the-differences-between-geography-and-geometry

[2]
https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI/edit#heading=h.oruaqt3nxcaf


On 2024/05/02 00:41:52 Szehon Ho wrote:
> Hi everyone,
>
> We have created a formal proposal for adding Geospatial support to
Iceberg.
>
> Please read the following for details.
>
>    - Github Proposal : https://github.com/apache/iceberg/issues/10260
>    - Proposal Doc:
>
https://docs.google.com/document/d/1iVFbrRNEzZl8tDcZC81GFt01QJkLJsI9E2NBOt21IRI
>
>
> Note that this proposal is built on existing extensive research and POC
> implementations (Geolake, Havasu).  Special thanks to Jia Yu and Kristin
> Cowalcijk from Wherobots/Geolake for extensive consultation and help in
> writing this proposal, as well as support from Yuanyuan Zhang from
Geolake.
>
> We would love to get more feedback for this proposal from the wider
> community and eventually discuss this in a community sync.
>
> Thanks
> Szehon
>

Reply via email to