Davis-Zhang-Onehouse commented on PR #13414:
URL: https://github.com/apache/hudi/pull/13414#issuecomment-2981141127

   @yihua @vinothchandar new items to factor in: backwards and forward 
compatibility. I spotted major issues and the PR is blocked on your feedback
   
   # Compatibility
   ## Forward compatibility
   
   If SI using version 2 (hash partition on data column value only) and hudi is 
of old binary, what happens is hudi does not has the concept of index version 
and will treat the new SI version as if it is the old one. As a result,
   
   ### Read path
   it should be fine since it will use prefix lookup which naturally compatible 
with the new partition strategy.
   
   ### Write path
   Write path is messed up as the old hudi binary will write to new index 
version with old partition strategy.
   What make things worse is the hudi index version is not updated as the old 
binary do not have such logic.
   So we end up with a corrupted version 2 hoodie index as the old hudi binary 
do not conform to the version 2 protocol of updating the index.
   
   ## backward compatibility
   this should be fine as the new hudi binary will properly recognize the 
version (or the absence of the version) and adapt properly.
   
   
   # Fundamental limitation of the index version design
   
   Old hudi binary only recognize and respect table version. Introducing index 
version means user must use a version that recognize and honor this.
   
   In industry the standard procedure is
   - introducing a "compatibility patch" which recognize the version and proper 
back off it is some future version (old hudi binary will choose not to use the 
index even there is one)
   - User must be aware of all readers/writers that happens to a hudi table. If 
the hudi table is of SI version 2, user must make sure all hudi versions are at 
least >= the compatibility patch.
   - This place a burden on the user side, and failed to do so means SI is 
silently corrupted and causing correctness issue which is not acceptable. We 
need a place to guide user and this is way cumbersome than a table version 
upgrade.
   - If all readers writers are managed by some service provider, this might 
not be a issue - just introduce the compatibiilty patch and we are all good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to