empcl commented on code in PR #12614:
URL: https://github.com/apache/hudi/pull/12614#discussion_r2016774065


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala:
##########
@@ -534,6 +535,15 @@ object HoodieFileIndex extends Logging {
       properties.setProperty(RECORDKEY_FIELD.key, 
tableConfig.getRecordKeyFields.orElse(Array.empty).mkString(","))
       properties.setProperty(PRECOMBINE_FIELD.key, 
Option(tableConfig.getPreCombineField).getOrElse(""))
       properties.setProperty(PARTITIONPATH_FIELD.key, 
HoodieTableConfig.getPartitionFieldPropForKeyGenerator(tableConfig).orElse(""))
+
+      // for simple bucket index, we need to set the INDEX_TYPE, 
BUCKET_INDEX_HASH_FIELD, BUCKET_INDEX_NUM_BUCKETS
+      val dataBase = Some(tableConfig.getDatabaseName)

Review Comment:
   @danny0405 @xicm hello, We consider a scenario where when creating a table, 
the number of buckets is not specified, and when writing data in Flink mode, 
SQL Hint is used to specify the number of buckets. Then, do we have to specify 
the number of buckets when reading Spark? If not specified, when optimizing the 
Spark Bucket Index read, an incorrect number of buckets will be obtained, 
resulting in abnormal data queries.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to