YuweiXiao commented on code in PR #4958:
URL: https://github.com/apache/hudi/pull/4958#discussion_r917394823
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/ConsistentBucketIdentifier.java:
##########
@@ -88,17 +96,106 @@ protected ConsistentHashingNode getBucket(int hashValue) {
return tailMap.isEmpty() ? ring.firstEntry().getValue() :
tailMap.get(tailMap.firstKey());
}
+ /**
+ * Get the former node of the given node (inferred from file id).
+ */
+ public ConsistentHashingNode getFormerBucket(String fileId) {
+ return getFormerBucket(getBucketByFileId(fileId).getValue());
+ }
+
+ /**
+ * Get the former node of the given node (inferred from hash value).
+ */
+ public ConsistentHashingNode getFormerBucket(int hashValue) {
+ SortedMap<Integer, ConsistentHashingNode> headMap =
ring.headMap(hashValue);
+ return headMap.isEmpty() ? ring.lastEntry().getValue() :
headMap.get(headMap.lastKey());
+ }
+
+ public List<ConsistentHashingNode> mergeBucket(List<String> fileIds) {
+ // Get nodes using fileIds
+ List<ConsistentHashingNode> nodes =
fileIds.stream().map(this::getBucketByFileId).collect(Collectors.toList());
Review Comment:
Yeah, sure. I will add the lower-bound check (i.e., at least two). The `min
bucket` check will left for the caller (which has been addressed), because it
is more like a global constraint.
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/ConsistentBucketIdentifier.java:
##########
@@ -88,17 +96,106 @@ protected ConsistentHashingNode getBucket(int hashValue) {
return tailMap.isEmpty() ? ring.firstEntry().getValue() :
tailMap.get(tailMap.firstKey());
}
+ /**
+ * Get the former node of the given node (inferred from file id).
+ */
+ public ConsistentHashingNode getFormerBucket(String fileId) {
+ return getFormerBucket(getBucketByFileId(fileId).getValue());
+ }
+
+ /**
+ * Get the former node of the given node (inferred from hash value).
+ */
+ public ConsistentHashingNode getFormerBucket(int hashValue) {
+ SortedMap<Integer, ConsistentHashingNode> headMap =
ring.headMap(hashValue);
+ return headMap.isEmpty() ? ring.lastEntry().getValue() :
headMap.get(headMap.lastKey());
+ }
+
+ public List<ConsistentHashingNode> mergeBucket(List<String> fileIds) {
+ // Get nodes using fileIds
+ List<ConsistentHashingNode> nodes =
fileIds.stream().map(this::getBucketByFileId).collect(Collectors.toList());
+
+ // Validate the input
+ for (int i = 0; i < nodes.size() - 1; ++i) {
+ ValidationUtils.checkState(getFormerBucket(nodes.get(i +
1).getValue()).getValue() == nodes.get(i).getValue(), "Cannot merge
discontinuous hash range");
+ }
+
+ // Create child nodes with proper tag (keep the last one and delete other
nodes)
+ List<ConsistentHashingNode> childNodes = new ArrayList<>(nodes.size());
+ for (int i = 0; i < nodes.size() - 1; ++i) {
+ childNodes.add(new ConsistentHashingNode(nodes.get(i).getValue(), null,
ConsistentHashingNode.NodeTag.DELETE));
+ }
+ childNodes.add(new ConsistentHashingNode(nodes.get(nodes.size() -
1).getValue(), FSUtils.createNewFileIdPfx(),
ConsistentHashingNode.NodeTag.REPLACE));
+ return childNodes;
+ }
+
+ public Option<List<ConsistentHashingNode>> splitBucket(String fileId) {
+ ConsistentHashingNode bucket = getBucketByFileId(fileId);
+ ValidationUtils.checkState(bucket != null, "FileId has no corresponding
bucket");
+ return splitBucket(bucket);
+ }
+
+ /**
+ * Split bucket in the range middle, also generate the corresponding file ids
+ *
+ * TODO support different split criteria, e.g., distribute records evenly
using statistics
Review Comment:
Sure!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]