shangxinli commented on PR #13674: URL: https://github.com/apache/hudi/pull/13674#issuecomment-3168584527
> Hi @shangxinli Thanks for this PR. Before have a deep review. Could you please help answer a few questions: 1. What is the difference between this strategy and the previous copier? 2. How to solve the problem of Clustering metadata fields such as _hoodie_file_name 3. For BloomFilter, how to merge the Hoodie custom BloomFilter field in the parquet footer? Thanks @YuangZhang for reviewing it and this is great feedback! The _hoodie_file_name is my previous commits ignored and it should handle that. Actually that is blocker of using the existing Parquet API to do so. With that I am going to revert the code complementation and reuse your implementation. The 2nd part of this PR is to avoid schema evolution. The reason of that is schema evolution ever caused outages due to the complexity of the schema itself. Some of the schemas are very complex with many nested layers and complex data types in it like key map, array. I added a flag to control on/off whether or not using schema evolution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
