Re: Spark cannot read iceberg tables which were originally written by Impala

2024-01-03 Thread OpenInx
Hi Zotan Thanks for the issue, I think it's fair to wait for a new major release for this breaking change. Best Regards. On Wed, Jan 3, 2024 at 11:16 PM Zoltán Borók-Nagy wrote: > Hi, > > I created a IMPALA-12675 > about annotating > STRING

Re: Spark cannot read iceberg tables which were originally written by Impala

2024-01-03 Thread Zoltán Borók-Nagy
Hi, I created a IMPALA-12675 about annotating STRINGs with UTF8 by default. The code change should be trivial, but I'm afraid we will need to wait for a new major release with this (because users might store binary data in STRING columns, so it

Re: Spark cannot read iceberg tables which were originally written by Impala

2024-01-03 Thread OpenInx
Thanks Zoltan and Ryan for your feedback. I think we all agreed that adding an option to promote BINARY to String (Approach A) in flink/spark/hive reader sides to read those historic dataset correctly written by impala on hive already. Besides that, applying approach B to future Apache Impala rel

Re: Spark cannot read iceberg tables which were originally written by Impala

2024-01-01 Thread Ryan Blue
Thanks for bringing this up and for finding the cause. I think we should add an option to promote binary to string (Approach A). That sounds pretty reasonable overall. I think it would be great if Impala also produced correct Parquet files, but that's beyond our control and there's, no doubt, a to

Re: Spark cannot read iceberg tables which were originally written by Impala

2023-12-26 Thread Zoltán Borók-Nagy
Hey Everyone, Thank you for raising this issue and reaching out to the Impala community. Let me clarify that the problem only happens when there is a legacy Hive table written by Impala, which is then converted to Iceberg. When Impala writes into an Iceberg table there is no problem with interope

Spark cannot read iceberg tables which were originally written by Impala

2023-12-25 Thread OpenInx
Hi dev Sensordata [1] had encountered an interesting Apache Impala & Iceberg bug in their real customer production environment. Their customers use Apache Impala to create a large mount of Apache Hive tables in HMS, and ingested PB-level dataset in their hive table (which were originally written b