Hello everyone,
I am working big geospatial and trying to solve very large tables in object
storage. Iceberg appear to be the ideal solution but does unfortunately not
appear to support geometry columns.
The way that iceberg is structured, it appears to be a good fit with the
GeoParquet-standa
Hi Thomas,
It sounds what you are trying to achieve is to provide a custom partition
function? There is some discussion here
https://github.com/apache/iceberg/issues/1482. I guess supporting geometry
through this framework makes more sense since it does not require extending
the Iceberg type syste
Thomas, thanks for taking the time to put this together!
I've always wanted geospatial support in the format, but thought that
it would be best to have an expert design and build it with us so we
don't get it wrong.
I think Walaa is right about the approach. We want to use partition
transforms to
Thanks Ryan! To expand a bit more:
For representation, I was thinking that geometry types could be expressed
as complex types (e.g., POINTs as Structs), so they are compatible with all
engines without having to introduce user-defined types in both Iceberg and
compute engines.
For the partitioning
Walaa,
How are those types defined? Would we need to have definitions in the
Iceberg spec?
Ryan
On Thu, Oct 27, 2022 at 9:47 AM Walaa Eldin Moustafa
wrote:
> Thanks Ryan! To expand a bit more:
>
> For representation, I was thinking that geometry types could be expressed
> as complex types (e.
Types, as in "POINT", etc? No, the point was to just express them as
complex types to avoid adding them to Iceberg spec and the engines (because
even if they were added to Iceberg spec, engines will likely not have them
as first class citizens anyways), i.e., their POINT/geometry semantics are
invi
Thanks for the detailed response 🙂
I think Ryan's point in the referenced issue is important - having a set of
transforms would be important in order to have consistent support across
engines.
Partition transforms would indeed have to do most of the heavy lifting in order
to simplify the query