[GitHub] [hudi] YuweiXiao commented on a diff in pull request #4958: [HUDI-3558] Consistent bucket index: bucket resizing (split&merge) & concurrent write during resizing

GitBox Sun, 10 Jul 2022 06:16:53 -0700


YuweiXiao commented on code in PR #4958:
URL: https://github.com/apache/hudi/pull/4958#discussion_r917394823



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/ConsistentBucketIdentifier.java:
##########
@@ -88,17 +96,106 @@ protected ConsistentHashingNode getBucket(int hashValue) {
     return tailMap.isEmpty() ? ring.firstEntry().getValue() : 
tailMap.get(tailMap.firstKey());
   }
 
+  /**
+   * Get the former node of the given node (inferred from file id).
+   */
+  public ConsistentHashingNode getFormerBucket(String fileId) {
+    return getFormerBucket(getBucketByFileId(fileId).getValue());
+  }
+
+  /**
+   * Get the former node of the given node (inferred from hash value).
+   */
+  public ConsistentHashingNode getFormerBucket(int hashValue) {
+    SortedMap<Integer, ConsistentHashingNode> headMap = 
ring.headMap(hashValue);
+    return headMap.isEmpty() ? ring.lastEntry().getValue() : 
headMap.get(headMap.lastKey());
+  }
+
+  public List<ConsistentHashingNode> mergeBucket(List<String> fileIds) {
+    // Get nodes using fileIds
+    List<ConsistentHashingNode> nodes = 
fileIds.stream().map(this::getBucketByFileId).collect(Collectors.toList());

Review Comment:
   Yeah, sure. I will add the lower-bound check (i.e., at least two). The `min 
bucket` check will left for the caller (which has been addressed), because it 
is more like a global constraint.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/ConsistentBucketIdentifier.java:
##########
@@ -88,17 +96,106 @@ protected ConsistentHashingNode getBucket(int hashValue) {
     return tailMap.isEmpty() ? ring.firstEntry().getValue() : 
tailMap.get(tailMap.firstKey());
   }
 
+  /**
+   * Get the former node of the given node (inferred from file id).
+   */
+  public ConsistentHashingNode getFormerBucket(String fileId) {
+    return getFormerBucket(getBucketByFileId(fileId).getValue());
+  }
+
+  /**
+   * Get the former node of the given node (inferred from hash value).
+   */
+  public ConsistentHashingNode getFormerBucket(int hashValue) {
+    SortedMap<Integer, ConsistentHashingNode> headMap = 
ring.headMap(hashValue);
+    return headMap.isEmpty() ? ring.lastEntry().getValue() : 
headMap.get(headMap.lastKey());
+  }
+
+  public List<ConsistentHashingNode> mergeBucket(List<String> fileIds) {
+    // Get nodes using fileIds
+    List<ConsistentHashingNode> nodes = 
fileIds.stream().map(this::getBucketByFileId).collect(Collectors.toList());
+
+    // Validate the input
+    for (int i = 0; i < nodes.size() - 1; ++i) {
+      ValidationUtils.checkState(getFormerBucket(nodes.get(i + 
1).getValue()).getValue() == nodes.get(i).getValue(), "Cannot merge 
discontinuous hash range");
+    }
+
+    // Create child nodes with proper tag (keep the last one and delete other 
nodes)
+    List<ConsistentHashingNode> childNodes = new ArrayList<>(nodes.size());
+    for (int i = 0; i < nodes.size() - 1; ++i) {
+      childNodes.add(new ConsistentHashingNode(nodes.get(i).getValue(), null, 
ConsistentHashingNode.NodeTag.DELETE));
+    }
+    childNodes.add(new ConsistentHashingNode(nodes.get(nodes.size() - 
1).getValue(), FSUtils.createNewFileIdPfx(), 
ConsistentHashingNode.NodeTag.REPLACE));
+    return childNodes;
+  }
+
+  public Option<List<ConsistentHashingNode>> splitBucket(String fileId) {
+    ConsistentHashingNode bucket = getBucketByFileId(fileId);
+    ValidationUtils.checkState(bucket != null, "FileId has no corresponding 
bucket");
+    return splitBucket(bucket);
+  }
+
+  /**
+   * Split bucket in the range middle, also generate the corresponding file ids
+   *
+   * TODO support different split criteria, e.g., distribute records evenly 
using statistics

Review Comment:
   Sure!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] YuweiXiao commented on a diff in pull request #4958: [HUDI-3558] Consistent bucket index: bucket resizing (split&merge) & concurrent write during resizing

Reply via email to