hudi-bot opened a new issue, #17070: URL: https://github.com/apache/hudi/issues/17070
https://issues.apache.org/jira/browse/HUDI-9505 New SI semantics please refer the above Jira and PR description. When doing the lookup, we have * keyToLookup: which is the raw value passed down by caller. The value is not escaped and can be null * hfile the flow is escape keyToLookup -> sort -> hfile lookup the hfile lookup involves some key matching rules, which as of today include: * Full key lookup RawKeyFromHfile.equals(keyToLookupEscaped) * prefix lookup RawKeyFromHfile.startWith(keyToLookupEscaped) For the new index lookup, we don't fall into either of the bucket, as what we are doing is extractUnescapedSecondaryKey(RawKeyFromHfile).equals(keyToLookupEscaped) This is SI specific logic and we should not use plain prefix lookup as the behavior is not the same and can cause correctness issue. We should extra a lambda function for key matching. hfile lookup involves 2 stages: given keyToLookupEscaped, need to locate the data block, this requires key order comparison. Here we need compare(keyToLookupEscaped, RawKeyFromHfile). Now it will become {code:java} compare(extractUnescapedSecondaryKey(keyToLookupEscaped), RawKeyFromHfile) {code} - once data blocks are located, we do sequential scan to find the exact key, previously it is RawKeyFromHfile.equals(keyToLookupEscaped) or RawKeyFromHfile.startWith(keyToLookupEscaped), now it will become ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-9551 - Type: Bug -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
