Walaa, How are those types defined? Would we need to have definitions in the Iceberg spec?
Ryan On Thu, Oct 27, 2022 at 9:47 AM Walaa Eldin Moustafa <wa.moust...@gmail.com> wrote: > Thanks Ryan! To expand a bit more: > > For representation, I was thinking that geometry types could be expressed > as complex types (e.g., POINTs as Structs), so they are compatible with all > engines without having to introduce user-defined types in both Iceberg and > compute engines. > > For the partitioning: > (1) Custom partition functions could directly operate on complex types > (e.g., structs representing POINTs). In this case the partitioning function > is like: geometry_hash(strcut_col); or > (2) Partitioning spec could be extended to allow "generated columns" to be > sources of partition functions, so a "generated" WKB column can be the > intermediate representation between complex geometry types and partition > functions that accept primitive types. In this case, the partitioning > function is like hashBytes(wlb(struct_col)). > > Thanks, > Walaa. > > On Thu, Oct 27, 2022 at 8:46 AM Ryan Blue <b...@tabular.io> wrote: > >> Thomas, thanks for taking the time to put this together! >> >> I've always wanted geospatial support in the format, but thought that >> it would be best to have an expert design and build it with us so we >> don't get it wrong. >> >> I think Walaa is right about the approach. We want to use partition >> transforms to do the heavy lifting of finding the right files for a >> query. That means that we'd need some clear but generic definition of >> geospatial objects in the data, along with more specific attributes. >> At a high level, I think that's probably done by storing each object >> using a standard envelope definition (bbox?) that we can use in >> partition transforms, and then a WKB column for the actual object. >> >> What do you think? >> >> Ryan >> >> On Thu, Oct 27, 2022 at 4:03 AM Walaa Eldin Moustafa >> <wa.moust...@gmail.com> wrote: >> > >> > Hi Thomas, >> > >> > It sounds what you are trying to achieve is to provide a custom >> partition function? There is some discussion here >> > https://github.com/apache/iceberg/issues/1482. I guess supporting >> geometry through this framework makes more sense since it does not require >> extending the Iceberg type system, yet general enough to support other >> applications. >> > >> > Thanks, >> > Walaa. >> > >> > On Thu, Oct 27, 2022 at 12:33 AM Thomas Fredriksen >> <thomas.fredriksen@oceandata.earth> wrote: >> >> >> >> Hello everyone, >> >> >> >> I am working big geospatial and trying to solve very large tables in >> object storage. Iceberg appear to be the ideal solution but does >> unfortunately not appear to support geometry columns. >> >> >> >> The way that iceberg is structured, it appears to be a good fit with >> the GeoParquet-standard ( >> https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md), >> so I created a pull request where I attempt to add this support: >> https://github.com/apache/iceberg/pull/6062 >> >> >> >> The PR deviates from GeoParquet in the CRS-field of the column >> metadata. GeoParquet requires the CRS to be defined as a PROJJSON JSON >> object, while the PR simply asks the user to specify and EPSG ID, where >> EPSG:4326 (WGS84 - latitude/longitude) is considered default. >> >> >> >> I would love feedback on the PR and welcome the discussion on whether >> geospatial/geometry belongs in the iceberg standard. >> >> >> >> Thomas Li Fredriksen >> >> Lead Solution Architect >> >> >> >> p +47 452 21 055 >> >> >> >> >> >> ––––– >> >> >> >> www.hubocean.earth >> >> >> >> >> -- >> Ryan Blue >> Tabular >> > -- Ryan Blue Tabular