Hi Iceberg Devs, I am evaluating the performance of bucketed joins across two bucketed datasets to find an optimal bucketing strategy. I was able to ingest into a bucketed table [1] and using the TableScan API, I am able to see a subset (total files / bucket size) of files being scanned [2]. I also benchmarked different joins across the two datasets (different bucket variations). However, I recently came across this comment <https://github.com/apache/iceberg/issues/430#issuecomment-533360026> on issue #430 <https://github.com/apache/iceberg/issues/430> indicating some work is still pending for Spark to leverage Iceberg bucket values. I was wondering if that comment is still accurate? Is there anything I can help contribute?
*[1] - Partition Spec* val partitionSpec = PartitionSpec .builderFor(mergedSchema) .identity("namespace") .bucket("id", numberOfBuckets) .build() *[2] - TableScan API* val iBucketIdExp = Expressions.equal("id", "1") val iBucketIdScan = table.newScan().filter(iBucketIdExp) val filesScanned = iBucketIdScan.planFiles.asScala.size.toLong -- Thanks, Romin