empcl commented on code in PR #12614: URL: https://github.com/apache/hudi/pull/12614#discussion_r2016774065
########## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala: ########## @@ -534,6 +535,15 @@ object HoodieFileIndex extends Logging { properties.setProperty(RECORDKEY_FIELD.key, tableConfig.getRecordKeyFields.orElse(Array.empty).mkString(",")) properties.setProperty(PRECOMBINE_FIELD.key, Option(tableConfig.getPreCombineField).getOrElse("")) properties.setProperty(PARTITIONPATH_FIELD.key, HoodieTableConfig.getPartitionFieldPropForKeyGenerator(tableConfig).orElse("")) + + // for simple bucket index, we need to set the INDEX_TYPE, BUCKET_INDEX_HASH_FIELD, BUCKET_INDEX_NUM_BUCKETS + val dataBase = Some(tableConfig.getDatabaseName) Review Comment: @danny0405 @xicm hello, We consider a scenario where when creating a table, the number of buckets is not specified, and when writing data in Flink mode, SQL Hint is used to specify the number of buckets. Then, do we have to specify the number of buckets when reading Spark? If not specified, when optimizing the Spark Bucket Index read, an incorrect number of buckets will be obtained, resulting in abnormal data queries. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org