Hi dev Sensordata [1] had encountered an interesting Apache Impala & Iceberg bug in their real customer production environment. Their customers use Apache Impala to create a large mount of Apache Hive tables in HMS, and ingested PB-level dataset in their hive table (which were originally written by Apache Impala). In recent days, their customers migrated those Hive tables to Apache Iceberg tables, but failed to query their huge dataset in iceberg table format by using the Apache Spark.
Jiajie Feng (from Sensordata) and I had wrote a simple demo to demonstrate this issue, for more details please see below: https://docs.google.com/document/d/1uXgj7GGp59K_hnV3gKWOsI2ljFTKcKBP1hb_Ux_HXuY/edit?usp=sharing We'd like to hear the feedback and suggestions from both the impala and iceberg community. I think both Jiajie and I would like to fix this issue if we had an aligned solution. Best Regards. 1. https://www.sensorsdata.com/en/