nsivabalan commented on code in PR #5274: URL: https://github.com/apache/hudi/pull/5274#discussion_r846685721
########## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ########## @@ -770,18 +774,6 @@ private MetadataRecordsGenerationParams getRecordsGenerationParams() { } } - private Set<String> getMetadataPartitionsToUpdate() { Review Comment: I will try to explain to the best of my understanding. but will let @codope chime in as well. Case 1: Existing MDT from 0.10.0, gets upgraded to 0.11 w/o enabling any new partitions. on first commit, after realizing FILES partition is already initialized, we will update the table config w/ "FILES" for completed MDT partitions. Case 2: Existing MDT from 0.10.0, gets upgraded to 0.11 w/ all partitions enabled (synchronous flow). On first commit, we will realize 2 new columns (col stats and bloom filter) are added and will initialize the new partitions. at the end of it, we will update the table Config w/ all 3 partitions to completed MDT partitions. Case3: For a fresh table, use wishes to enable async indexing for col stats and bloom filter. w/ regular writer, async indexing has to be enabled for these 2 partitions. So, with a diff process altogether, user is expected to schedule and execute the index building. During scheduling, both partitions (col stats and bloom filter) will be added to table config for the list of MDT partitions being built. Once this is updated, with regular writer process, a data table commit when getting applied to MDT, will update all 3 partitions in MDT (FILES as part of completed MDT partitions and other 2 as part of MDT partitions being built out). This is case where we are in need of getMetadataPartitionsToUpdate() for writers to know what all partitions to update. I listed Case 1 and Case2 just for completeness. but case 3 is where we might be in need of partitions being built out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org