Davis-Zhang-Onehouse commented on PR #13414:
URL: https://github.com/apache/hudi/pull/13414#issuecomment-2977618510

   @yihua asked for PR owner @Davis-Zhang-Onehouse to work on the following AI:
   
   > Index definition looks better. sth to store as part of the table metadata, 
instead of implicitly inferring it. We should store something in the .index.def.
   
   # Proposed change
   
   Introducing a new attribute "partitionStrategy". Possible values:
   - HASH_ON_SECONDARY_KEY: exclusive to SI. It means we can determine SI file 
group by computing the hash value only using the secondary data column value.
   - HASH_ON_SECONDARY_KEY_RECORD_KEY: exclusive to SI. It means the old 
behavior.
   
   <img width="1695" alt="image" 
src="https://github.com/user-attachments/assets/c9f69b5c-1ecf-4899-a5c0-2cbe6290712c";
 />
   
   Also from the schema evolution perspective about this json. If we didn't 
find such field, depends on the index type, we infer the default strategy. For 
SI, in case we could not find the value, it means using 
`HASH_ON_SECONDARY_KEY_RECORD_KEY`.
   
   # Implementation details
   ## Read path
   Whenever we do lookup using indexes, we will read the index def file and add 
partitionStrategy value to the context. If it is
   HASH_ON_SECONDARY_KEY_RECORD_KEY, it will route to prefix lookup code. If it 
is HASH_ON_SECONDARY_KEY, it is route to the new code
   
   Also for all the other indexes, we need to do the same as they share the 
same code path. It is an index look up interface change so it impacts code 
shared by all indexes.
   
   ## Write path
   If it is creating a new interface, we always write partitionStrategy 
attribute in the strategy. For SI it will be of value 
HASH_ON_SECONDARY_KEY_RECORD_KEY. For others, we will come up name mapping to 
their existing behavior.
   
   ## Misc
   revert the table version related code change.
   
   
   @yihua please comment on the proposed plan when you get a chance. thanks
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to