Thanks Ryan! To expand a bit more: For representation, I was thinking that geometry types could be expressed as complex types (e.g., POINTs as Structs), so they are compatible with all engines without having to introduce user-defined types in both Iceberg and compute engines.
For the partitioning: (1) Custom partition functions could directly operate on complex types (e.g., structs representing POINTs). In this case the partitioning function is like: geometry_hash(strcut_col); or (2) Partitioning spec could be extended to allow "generated columns" to be sources of partition functions, so a "generated" WKB column can be the intermediate representation between complex geometry types and partition functions that accept primitive types. In this case, the partitioning function is like hashBytes(wlb(struct_col)). Thanks, Walaa. On Thu, Oct 27, 2022 at 8:46 AM Ryan Blue <b...@tabular.io> wrote: > Thomas, thanks for taking the time to put this together! > > I've always wanted geospatial support in the format, but thought that > it would be best to have an expert design and build it with us so we > don't get it wrong. > > I think Walaa is right about the approach. We want to use partition > transforms to do the heavy lifting of finding the right files for a > query. That means that we'd need some clear but generic definition of > geospatial objects in the data, along with more specific attributes. > At a high level, I think that's probably done by storing each object > using a standard envelope definition (bbox?) that we can use in > partition transforms, and then a WKB column for the actual object. > > What do you think? > > Ryan > > On Thu, Oct 27, 2022 at 4:03 AM Walaa Eldin Moustafa > <wa.moust...@gmail.com> wrote: > > > > Hi Thomas, > > > > It sounds what you are trying to achieve is to provide a custom > partition function? There is some discussion here > > https://github.com/apache/iceberg/issues/1482. I guess supporting > geometry through this framework makes more sense since it does not require > extending the Iceberg type system, yet general enough to support other > applications. > > > > Thanks, > > Walaa. > > > > On Thu, Oct 27, 2022 at 12:33 AM Thomas Fredriksen > <thomas.fredriksen@oceandata.earth> wrote: > >> > >> Hello everyone, > >> > >> I am working big geospatial and trying to solve very large tables in > object storage. Iceberg appear to be the ideal solution but does > unfortunately not appear to support geometry columns. > >> > >> The way that iceberg is structured, it appears to be a good fit with > the GeoParquet-standard ( > https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md), > so I created a pull request where I attempt to add this support: > https://github.com/apache/iceberg/pull/6062 > >> > >> The PR deviates from GeoParquet in the CRS-field of the column > metadata. GeoParquet requires the CRS to be defined as a PROJJSON JSON > object, while the PR simply asks the user to specify and EPSG ID, where > EPSG:4326 (WGS84 - latitude/longitude) is considered default. > >> > >> I would love feedback on the PR and welcome the discussion on whether > geospatial/geometry belongs in the iceberg standard. > >> > >> Thomas Li Fredriksen > >> Lead Solution Architect > >> > >> p +47 452 21 055 > >> > >> > >> ––––– > >> > >> www.hubocean.earth > > > > > -- > Ryan Blue > Tabular >