yihua commented on code in PR #13350:
URL: https://github.com/apache/hudi/pull/13350#discussion_r2105977129


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/BucketIdentifier.java:
##########
@@ -54,15 +54,30 @@ private static List<String> 
getHashKeysUsingIndexFields(String recordKey, List<S
   }
 
   public static String partitionBucketIdStr(String partition, int bucketId) {
-    return String.format("%s_%s", partition, bucketIdStr(bucketId));
+    // format: {partition}_{bucket_id}, bucket id should be 8 digits long, 
padded with leading zeros
+    StringBuilder sb = new StringBuilder()
+        .append(partition)
+        .append('_');
+    return appendWithPadZero(bucketId, 8, sb).toString();
   }
 
   public static int bucketIdFromFileId(String fileId) {
     return Integer.parseInt(fileId.substring(0, 8));
   }
 
   public static String bucketIdStr(int n) {
-    return String.format("%08d", n);
+    // bucket str should be 8 digits long, padded with leading zeros, format 
like: "00000001" for bucket 1

Review Comment:
   Should `partitionBucketIdStr` and `bucketIdStr` be called only during the 
initialization of the partitioner or relevant class so that int to String 
mapping is precomputed?  Since this mapping does not rely on record, and can be 
predetermined based on the number of buckets from the config, it does not make 
sense to invoke these methods per record.  It would be good to revisit all 
usages of these two methods.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to