Davis-Zhang-Onehouse commented on PR #13523:
URL: https://github.com/apache/hudi/pull/13523#issuecomment-3049806624
> Only took a 2 min skim. Extending the expressions sg, but should we do it
using sth custom? is there a standard relational expression that can used in
structure and naming
# Do I understand your idea correctly?
## Problem statement
We need 2 abstraction:
- [Customized seekKey][nice to have] customized way of building the seekKey
for reader.seekTo. To lookup SI key "secKey", the seekKey should be "secKey$".
This helps skip many irrelevant records and boost lookup perf as we can skip
irrelevant records and even data blocks, especially if there are many records
sharing similar prefix.
- [Customized index record matching] [must have] customized way of key
matching. To match SI index record key "secKey$recKey" with lookup key
"secKey", we must
do`getUnescapedSecondaryKeyFromSecondaryIndexKey(recordKey).equals(lookupKey)`.
This is customized logic.
We need to see how to provide such abstractions. It is FG reader interface
level change we should discuss once and implement only once.
You seem to suggest something as below:
Use Expression Builder:
Instead of: transformKeysToPredicateByOperator(keys,
SECONDARY_INDEX_KEY_MATCH)
Use: Predicates.in(
Functions.substringBeforeSpliter(column, "$"),
keys
)
1. Remove custom operator: Instead of SECONDARY_INDEX_KEY_MATCH, use a
combination of:
- A string transformation function (e.g., SUBSTRING_BEFORE)
- Standard EQUALS operator
2. Introduce String Functions:
- Add SUBSTRING_BEFORE_SPLITER(expr, delimiter) function, which is mroe
generic replacing customized getUnescapedSecondaryKeyFromSecondaryIndexKey
- Add proper handling for escaped delimiters
- This follows standard SQL function patterns
3. Transform at Expression Level:
Current: SECONDARY_INDEX_KEY_MATCH(column, ['key1', 'key2'])
New: SUBSTRING_BEFORE_SPLITER(indexRecordKey, '$') IN ['key1', 'key2']
# What I need to think about
How to abstract "[Customized seekKey]" - need to think more, no plan yet.
How to abstract "[Customized index record matching]" - this I can explore
the compound expression approach above.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]