Walaa,

How are those types defined? Would we need to have definitions in the
Iceberg spec?

Ryan


On Thu, Oct 27, 2022 at 9:47 AM Walaa Eldin Moustafa <wa.moust...@gmail.com>
wrote:

> Thanks Ryan! To expand a bit more:
>
> For representation, I was thinking that geometry types could be expressed
> as complex types (e.g., POINTs as Structs), so they are compatible with all
> engines without having to introduce user-defined types in both Iceberg and
> compute engines.
>
> For the partitioning:
> (1) Custom partition functions could directly operate on complex types
> (e.g., structs representing POINTs). In this case the partitioning function
> is like: geometry_hash(strcut_col); or
> (2) Partitioning spec could be extended to allow "generated columns" to be
> sources of partition functions, so a "generated" WKB column can be the
> intermediate representation between complex geometry types and partition
> functions that accept primitive types. In this case, the partitioning
> function is like hashBytes(wlb(struct_col)).
>
> Thanks,
> Walaa.
>
> On Thu, Oct 27, 2022 at 8:46 AM Ryan Blue <b...@tabular.io> wrote:
>
>> Thomas, thanks for taking the time to put this together!
>>
>> I've always wanted geospatial support in the format, but thought that
>> it would be best to have an expert design and build it with us so we
>> don't get it wrong.
>>
>> I think Walaa is right about the approach. We want to use partition
>> transforms to do the heavy lifting of finding the right files for a
>> query. That means that we'd need some clear but generic definition of
>> geospatial objects in the data, along with more specific attributes.
>> At a high level, I think that's probably done by storing each object
>> using a standard envelope definition (bbox?) that we can use in
>> partition transforms, and then a WKB column for the actual object.
>>
>> What do you think?
>>
>> Ryan
>>
>> On Thu, Oct 27, 2022 at 4:03 AM Walaa Eldin Moustafa
>> <wa.moust...@gmail.com> wrote:
>> >
>> > Hi Thomas,
>> >
>> > It sounds what you are trying to achieve is to provide a custom
>> partition function? There is some discussion here
>> > https://github.com/apache/iceberg/issues/1482. I guess supporting
>> geometry through this framework makes more sense since it does not require
>> extending the Iceberg type system, yet general enough to support other
>> applications.
>> >
>> > Thanks,
>> > Walaa.
>> >
>> > On Thu, Oct 27, 2022 at 12:33 AM Thomas Fredriksen
>> <thomas.fredriksen@oceandata.earth> wrote:
>> >>
>> >> Hello everyone,
>> >>
>> >> I am working big geospatial and trying to solve very large tables in
>> object storage. Iceberg appear to be the ideal solution but does
>> unfortunately not appear to support geometry columns.
>> >>
>> >> The way that iceberg is structured, it appears to be a good fit with
>> the GeoParquet-standard (
>> https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md),
>> so I created a pull request where I attempt to add this support:
>> https://github.com/apache/iceberg/pull/6062
>> >>
>> >> The PR deviates from GeoParquet in the CRS-field of the column
>> metadata. GeoParquet requires the CRS to be defined as a PROJJSON JSON
>> object, while the PR simply asks the user to specify and EPSG ID, where
>> EPSG:4326 (WGS84 - latitude/longitude) is considered default.
>> >>
>> >> I would love feedback on the PR and welcome the discussion on whether
>> geospatial/geometry belongs in the iceberg standard.
>> >>
>> >> Thomas Li Fredriksen
>> >> Lead Solution Architect
>> >>
>> >> p +47 452 21 055
>> >>
>> >>
>> >> –––––
>> >>
>> >> www.hubocean.earth
>>
>>
>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>

-- 
Ryan Blue
Tabular

Reply via email to