TheR1sing3un commented on code in PR #13350:
URL: https://github.com/apache/hudi/pull/13350#discussion_r2106060118
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/BucketIdentifier.java:
##########
@@ -54,15 +54,30 @@ private static List<String>
getHashKeysUsingIndexFields(String recordKey, List<S
}
public static String partitionBucketIdStr(String partition, int bucketId) {
- return String.format("%s_%s", partition, bucketIdStr(bucketId));
+ // format: {partition}_{bucket_id}, bucket id should be 8 digits long,
padded with leading zeros
+ StringBuilder sb = new StringBuilder()
+ .append(partition)
+ .append('_');
+ return appendWithPadZero(bucketId, 8, sb).toString();
}
public static int bucketIdFromFileId(String fileId) {
return Integer.parseInt(fileId.substring(0, 8));
}
public static String bucketIdStr(int n) {
- return String.format("%08d", n);
+ // bucket str should be 8 digits long, padded with leading zeros, format
like: "00000001" for bucket 1
Review Comment:
> Should `partitionBucketIdStr` and `bucketIdStr` be called only during the
initialization of the partitioner or relevant class so that int to String
mapping is precomputed? Since this mapping does not rely on record, and can be
predetermined based on the number of buckets from the config, it does not make
sense to invoke these methods per record. It would be good to revisit all
usages of these two methods.
I checked the place where the call was made. What you said does make sense.
In many places where this method was called, not every record would pass
through. The change here could indeed be avoided. The basis for my initial
modification here was that I attempted to eliminate `String.format` to avoid
performance loss from the lower-level methods instead of relying on the
caller's guarantee. However, it is reasonable for the modifications here to be
reverted . What do you think? Should we withdraw these changes?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]