Hi Gustavo,

Not too familiar with the Airflow user base/use cases, but we had to consider 
similar things when decided what to do with `CREATE EXTERNAL TABLE ice_table 
PARTITIONED BY ...` Hive queries.
See: https://github.com/apache/iceberg/pull/1917 
<https://github.com/apache/iceberg/pull/1917>

The decision there was, that even thought the user issued a command to create a 
partitioned Hive table, we created an unpartitioned Hive table, where the 
backing Iceberg table was using identity partitions for the originally 
requested columns.

Hope this helps a bit.

Thanks,
Peter

> On Mar 2, 2021, at 03:38, Gustavo Torres Torres 
> <gustavo.tor...@airbnb.com.INVALID> wrote:
> 
> Hey folks,
> 
> Lately I've been thinking about integration between Airflow & Iceberg for a 
> smooth transition from Hive-based tables to Iceberg ones and would like to 
> hear about your experience. Specifically about Iceberg partition sensors in 
> Airflow.
> 
> From the way I see it, there are two ways to go about this (at least for 
> Hive-based catalogs): 
> 
> Modify our Hive Metastore API so that partitions-APIs are handled directly by 
> the Iceberg API. This has the advantage of being mostly transparent to users 
> but has the downside of being confusing since Iceberg creates tables with the 
> Hive catalog as external non-partitioned tables.
> Create a separate sensor that makes it clear that we are sensing over an 
> Iceberg table. This is probably the most straightforward approach, but if we 
> do this we would probably need to do the same for any tool that used the 
> metastore to get partition information.
> 
> Would love to hear what your experiences have been.
> Thanks

Reply via email to