Spark cannot read iceberg tables which were originally written by Impala

OpenInx Mon, 25 Dec 2023 22:41:04 -0800

Hi dev

Sensordata [1] had encountered an interesting Apache Impala & Iceberg bug
in their real customer production environment.
Their customers use Apache Impala to create a large mount of Apache Hive
tables in HMS, and ingested PB-level dataset
in their hive table (which were originally written by Apache Impala).   In
recent days,  their customers migrated those Hive
tables to Apache Iceberg tables, but failed to query their huge dataset in
iceberg table format by using the Apache Spark.


Jiajie Feng (from Sensordata) and I had wrote a simple demo to demonstrate
this issue, for more details please see below:
https://docs.google.com/document/d/1uXgj7GGp59K_hnV3gKWOsI2ljFTKcKBP1hb_Ux_HXuY/edit?usp=sharing

We'd like to hear the feedback and suggestions from both the impala and
iceberg community. I think both Jiajie and I would like
to fix this issue if we had an aligned solution.

Best Regards.

1. https://www.sensorsdata.com/en/

Spark cannot read iceberg tables which were originally written by Impala

Reply via email to