Davis-Zhang-Onehouse commented on code in PR #13489:
URL: https://github.com/apache/hudi/pull/13489#discussion_r2181732288
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java:
##########
@@ -277,61 +268,291 @@ private Predicate
transformKeyPrefixesToPredicate(List<String> keyPrefixes) {
return Predicates.startsWithAny(null, right);
}
- @Override
- protected Map<String, HoodieRecord<HoodieMetadataPayload>>
getRecordsByKeys(List<String> keys, String partitionName) {
- if (keys.isEmpty()) {
- return Collections.emptyMap();
+ private HoodiePairData<String, HoodieRecord<HoodieMetadataPayload>>
doLookupWithMapping(
+ HoodieData<String> keys, String partitionName, List<FileSlice>
fileSlices, boolean isSecondaryIndex,
+ Option<SerializableFunction<String, String>> keyEncodingFn) {
+
+ final int numFileSlices = fileSlices.size();
+ if (numFileSlices == 1) {
+ TreeSet<String> distinctSortedKeys = new TreeSet<>(keys.collectAsList());
+ return lookupRecordsWithMapping(partitionName, new
ArrayList<>(distinctSortedKeys), fileSlices.get(0), !isSecondaryIndex,
keyEncodingFn);
}
- Map<String, HoodieRecord<HoodieMetadataPayload>> result;
+ SerializableBiFunction<String, Integer, Integer> mappingFunction =
HoodieTableMetadataUtil::mapRecordKeyToFileGroupIndex;
Review Comment:
Hi Danny, regarding this commit
https://github.com/apache/hudi/pull/13489/commits/d945e419cd07c0160dfe9ff60cca29a233eb9eea,
let's chat when you get a chance.
SI v2 hash func, unlike other indexes, needs to handle 2 types of input:
1. sec$rec (this is what write path will give, in func like tagLocation) -
we need to extract sec out and compute hash
2. sec (this is what read path will give, in func lie readSecondaryIndex) -
we can compute hash directly
3 factors (is SI, is v2, is case 1 or case 2)
This knowledge is known via either ways:
1. from the call site, the caller knows which case it is handling, so it can
choose to give hard coded "true"/"false" to indicate this fact
- this is [my previous
implementation](https://github.com/apache/hudi/pull/13489/commits/a16f8a4e456127711e87a0312f4684df4c917e30),
on write path, it knows it is case 1, so we only need to see if it is v2 and
it is SI. On read path, same for read path, we know it can only be case 2, so
we only need to check if it is v2 and SI.
2. or we explicitly checking each key, if there is an unescaped $, then it
is case 1, else it is case 2
- this incur some overhead on read path. That's why I didn't go with this
route.
There can never be a case where there is a mix keys from the 2 cases we need
to handle. So the caller only need to decide once before apply it over all the
given keys.
you commit
https://github.com/apache/hudi/pull/13489/commits/d945e419cd07c0160dfe9ff60cca29a233eb9eea
implement the above logic in a wrong way. As we only consider the first 2
factor (is SI, is v2), but the 3rd factor is missing
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java:
##########
@@ -277,61 +268,291 @@ private Predicate
transformKeyPrefixesToPredicate(List<String> keyPrefixes) {
return Predicates.startsWithAny(null, right);
}
- @Override
- protected Map<String, HoodieRecord<HoodieMetadataPayload>>
getRecordsByKeys(List<String> keys, String partitionName) {
- if (keys.isEmpty()) {
- return Collections.emptyMap();
+ private HoodiePairData<String, HoodieRecord<HoodieMetadataPayload>>
doLookupWithMapping(
+ HoodieData<String> keys, String partitionName, List<FileSlice>
fileSlices, boolean isSecondaryIndex,
+ Option<SerializableFunction<String, String>> keyEncodingFn) {
+
+ final int numFileSlices = fileSlices.size();
+ if (numFileSlices == 1) {
+ TreeSet<String> distinctSortedKeys = new TreeSet<>(keys.collectAsList());
+ return lookupRecordsWithMapping(partitionName, new
ArrayList<>(distinctSortedKeys), fileSlices.get(0), !isSecondaryIndex,
keyEncodingFn);
}
- Map<String, HoodieRecord<HoodieMetadataPayload>> result;
+ SerializableBiFunction<String, Integer, Integer> mappingFunction =
HoodieTableMetadataUtil::mapRecordKeyToFileGroupIndex;
Review Comment:
I will go with the "or we explicitly checking each key" route this time, so
it is not too much code change. Will push a commit
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]