Thanks Ryan! To expand a bit more:

For representation, I was thinking that geometry types could be expressed
as complex types (e.g., POINTs as Structs), so they are compatible with all
engines without having to introduce user-defined types in both Iceberg and
compute engines.

For the partitioning:
(1) Custom partition functions could directly operate on complex types
(e.g., structs representing POINTs). In this case the partitioning function
is like: geometry_hash(strcut_col); or
(2) Partitioning spec could be extended to allow "generated columns" to be
sources of partition functions, so a "generated" WKB column can be the
intermediate representation between complex geometry types and partition
functions that accept primitive types. In this case, the partitioning
function is like hashBytes(wlb(struct_col)).

Thanks,
Walaa.

On Thu, Oct 27, 2022 at 8:46 AM Ryan Blue <b...@tabular.io> wrote:

> Thomas, thanks for taking the time to put this together!
>
> I've always wanted geospatial support in the format, but thought that
> it would be best to have an expert design and build it with us so we
> don't get it wrong.
>
> I think Walaa is right about the approach. We want to use partition
> transforms to do the heavy lifting of finding the right files for a
> query. That means that we'd need some clear but generic definition of
> geospatial objects in the data, along with more specific attributes.
> At a high level, I think that's probably done by storing each object
> using a standard envelope definition (bbox?) that we can use in
> partition transforms, and then a WKB column for the actual object.
>
> What do you think?
>
> Ryan
>
> On Thu, Oct 27, 2022 at 4:03 AM Walaa Eldin Moustafa
> <wa.moust...@gmail.com> wrote:
> >
> > Hi Thomas,
> >
> > It sounds what you are trying to achieve is to provide a custom
> partition function? There is some discussion here
> > https://github.com/apache/iceberg/issues/1482. I guess supporting
> geometry through this framework makes more sense since it does not require
> extending the Iceberg type system, yet general enough to support other
> applications.
> >
> > Thanks,
> > Walaa.
> >
> > On Thu, Oct 27, 2022 at 12:33 AM Thomas Fredriksen
> <thomas.fredriksen@oceandata.earth> wrote:
> >>
> >> Hello everyone,
> >>
> >> I am working big geospatial and trying to solve very large tables in
> object storage. Iceberg appear to be the ideal solution but does
> unfortunately not appear to support geometry columns.
> >>
> >> The way that iceberg is structured, it appears to be a good fit with
> the GeoParquet-standard (
> https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md),
> so I created a pull request where I attempt to add this support:
> https://github.com/apache/iceberg/pull/6062
> >>
> >> The PR deviates from GeoParquet in the CRS-field of the column
> metadata. GeoParquet requires the CRS to be defined as a PROJJSON JSON
> object, while the PR simply asks the user to specify and EPSG ID, where
> EPSG:4326 (WGS84 - latitude/longitude) is considered default.
> >>
> >> I would love feedback on the PR and welcome the discussion on whether
> geospatial/geometry belongs in the iceberg standard.
> >>
> >> Thomas Li Fredriksen
> >> Lead Solution Architect
> >>
> >> p +47 452 21 055
> >>
> >>
> >> –––––
> >>
> >> www.hubocean.earth
>
>
>
>
> --
> Ryan Blue
> Tabular
>

Reply via email to