Geospatial/geometry support

2022-10-27 Thread Thomas Fredriksen
Hello everyone, I am working big geospatial and trying to solve very large tables in object storage. Iceberg appear to be the ideal solution but does unfortunately not appear to support geometry columns. The way that iceberg is structured, it appears to be a good fit with the GeoParquet-standa

Re: Geospatial/geometry support

2022-10-27 Thread Walaa Eldin Moustafa
Hi Thomas, It sounds what you are trying to achieve is to provide a custom partition function? There is some discussion here https://github.com/apache/iceberg/issues/1482. I guess supporting geometry through this framework makes more sense since it does not require extending the Iceberg type syste

Re: Geospatial/geometry support

2022-10-27 Thread Ryan Blue
Thomas, thanks for taking the time to put this together! I've always wanted geospatial support in the format, but thought that it would be best to have an expert design and build it with us so we don't get it wrong. I think Walaa is right about the approach. We want to use partition transforms to

Re: Geospatial/geometry support

2022-10-27 Thread Walaa Eldin Moustafa
Thanks Ryan! To expand a bit more: For representation, I was thinking that geometry types could be expressed as complex types (e.g., POINTs as Structs), so they are compatible with all engines without having to introduce user-defined types in both Iceberg and compute engines. For the partitioning

Re: Geospatial/geometry support

2022-10-27 Thread Ryan Blue
Walaa, How are those types defined? Would we need to have definitions in the Iceberg spec? Ryan On Thu, Oct 27, 2022 at 9:47 AM Walaa Eldin Moustafa wrote: > Thanks Ryan! To expand a bit more: > > For representation, I was thinking that geometry types could be expressed > as complex types (e.

Re: Geospatial/geometry support

2022-10-27 Thread Walaa Eldin Moustafa
Types, as in "POINT", etc? No, the point was to just express them as complex types to avoid adding them to Iceberg spec and the engines (because even if they were added to Iceberg spec, engines will likely not have them as first class citizens anyways), i.e., their POINT/geometry semantics are invi

Re: Geospatial/geometry support

2022-10-27 Thread Thomas Fredriksen
Thanks for the detailed response 🙂 I think Ryan's point in the referenced issue is important - having a set of transforms would be important in order to have consistent support across engines. Partition transforms would indeed have to do most of the heavy lifting in order to simplify the query