Re: Data incorrectness when bucket joining Iceberg table

2023-04-20 Thread Anton Okolnychyi
Iceberg and Spark hash functions are not compatible, just like Hive and Spark hash functions are not compatible. That’s why the new SPJ framework depends on the function catalog. - Anton > On Apr 18, 2023, at 7:09 PM, Manu Zhang wrote: > > Hi All, > > Since there had been no bucket join in S

Data incorrectness when bucket joining Iceberg table

2023-04-18 Thread Manu Zhang
Hi All, Since there had been no bucket join in Spark DSv2 until storage-partitioned join was added in 3.4.0, we've implemented our own in Iceberg. We find an issue when joining an Iceberg table with a Spark parquet table as follows. 1. The left side is an Iceberg table with hash distribution