Davis-Zhang-Onehouse opened a new pull request, #13523:
URL: https://github.com/apache/hudi/pull/13523

   ### Change Logs
   
   As of today, to look up key "secKey" via secondary index over secondary 
column index key in format "secKey$recKey", we intend to match all records 
whose secondary key portion (anything prior to $) match the given "secKey".
   
   As of today we do such matching via prefix matching, which is not functional 
at all times. For example, if we have index record
   
   "secKey1$recKey"
   "secKey2$recKey"
   "secKey3$recKey"
   
   and we try to do index look up with "secKey", it should match nothing, while 
prefix matching will say all 3 matches and causes query correctness issues.
   
   To solve the issue, we need to have customized key matching logic in the 
following form:
   given "secKey" as the look up key and index record key "secKey$recKey". We 
should extract secKey out and do string.equal the look up key.
   
   The PR extend the hfile record prefix matching iterator to abstract away the 
logic of how the seekKey (key consumed by hfile reader seekTo) is generated and 
the key matching logic. So based on the predicate type, we can pick the 
iterator that use the proper implementation underneath. Here is how it works
   
   Before:
   A boolean flag "isFullKey" is used to choose between full key matching / 
prefix key matching
   After:
   We use Expression.Operator enum to specify what type of hfile iterator we 
would like to use.
   
   Before:
   in hfile prefix reader iterator, 
   - seek key used to do reader.seek to is exactly the lookup key in its 
escaped form
   - match logic is string.startsWith
   Now 
   the prefix supports 2 iterator - the old prefix iterator and secondary key 
iterator. The later 
   - provides seek key as [escaped look up key] + "$"
   - match logic is extractEscapedSecKey(index record key).equals(escaped 
lookup key)
   
   ### Impact
   Secondary index look up now behaves correctly.
   
   ### Risk level (write none, low medium or high below)
   
   None
   ### Documentation Update
   None
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to