[PR] [HUDI-9370] Unify logic of fetching files and file slices in the metadata table writer [hudi]

via GitHub Fri, 02 May 2025 19:47:26 -0700


yihua opened a new pull request, #13254:
URL: https://github.com/apache/hudi/pull/13254


   ### Change Logs
   
   This PR unifies logic of fetching files and file slices in the metadata 
table writer so the index initialization is only based on two types of 
information of the file system view:
   - (1) `partitionIdToAllFilesMap`: all the files in a table used by `FILES`, 
`BLOOM_FILTERS`, and `COLUMN_STATS` partitions;
   - (2) `lazyLatestMergedPartitionFileSliceList` (lazily evaluated only if 
needed): latest merged file slices used by `RECORD_INDEX`, `EXPRESSION_INDEX`, 
`PARTITION_STATS`, and `SECONDARY_INDEX` partitions.
   
   Note that these two may be further unified, which is out of the scope of 
this PR.  These two types of information are good enough for two types of 
indexes, one type based on all files and the other based on the latest merged 
file slices.
   
   ### Impact
   
   Code simplification for MDT writer refactoring
   
   ### Risk level
   
   none
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [HUDI-9370] Unify logic of fetching files and file slices in the metadata table writer [hudi]

Reply via email to