yihua commented on code in PR #13350:
URL: https://github.com/apache/hudi/pull/13350#discussion_r2106064110
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/BucketIdentifier.java:
##########
@@ -54,15 +54,30 @@ private static List<String>
getHashKeysUsingIndexFields(String recordKey, List<S
}
public static String partitionBucketIdStr(String partition, int bucketId) {
- return String.format("%s_%s", partition, bucketIdStr(bucketId));
+ // format: {partition}_{bucket_id}, bucket id should be 8 digits long,
padded with leading zeros
+ StringBuilder sb = new StringBuilder()
+ .append(partition)
+ .append('_');
+ return appendWithPadZero(bucketId, 8, sb).toString();
}
public static int bucketIdFromFileId(String fileId) {
return Integer.parseInt(fileId.substring(0, 8));
}
public static String bucketIdStr(int n) {
- return String.format("%08d", n);
+ // bucket str should be 8 digits long, padded with leading zeros, format
like: "00000001" for bucket 1
Review Comment:
Got it. It makes sense to drop this PR or repurpose this PR to refactor the
callers. I can see the problem is that there is no real limitation on where
the util methods are used, so they can be called on per-partition or per-record
processibg. It would be good to move these methods to the caller class, make
them private, and refactor the calls.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]