nsivabalan commented on code in PR #5274:
URL: https://github.com/apache/hudi/pull/5274#discussion_r846685721


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##########
@@ -770,18 +774,6 @@ private MetadataRecordsGenerationParams 
getRecordsGenerationParams() {
     }
   }
 
-  private Set<String> getMetadataPartitionsToUpdate() {

Review Comment:
   I will try to explain to the best of my understanding. but will let @codope 
chime in as well. 
   
   Case 1: 
   Existing MDT from 0.10.0, gets upgraded to 0.11 w/o enabling any new 
partitions. 
   on first commit, after realizing FILES partition is already initialized, we 
will update the table config w/ "FILES" for completed MDT partitions. 
   
   Case 2: 
   Existing MDT from 0.10.0, gets upgraded to 0.11 w/ all partitions enabled 
(synchronous flow). 
   On first commit, we will realize 2 new columns (col stats and bloom filter) 
are added and will initialize the new partitions. at the end of it, we will 
update the table Config w/ all 3 partitions to completed MDT partitions. 
   
   Case3: 
   For a fresh table, use wishes to enable async indexing for col stats and 
bloom filter. w/ regular writer, async indexing has to be enabled for these 2 
partitions. So, with a diff process altogether, user is expected to schedule 
and execute the index building. During scheduling, both partitions (col stats 
and bloom filter) will be added to table config for the list of MDT partitions 
being built. Once this is updated, with regular writer process, a data table 
commit when getting applied to MDT, will update all 3 partitions in MDT (FILES 
as part of completed MDT partitions and other 2 as part of MDT partitions being 
built out). This is case where we are in need of 
getMetadataPartitionsToUpdate() for writers to know what all partitions to 
update. 
   
   I listed Case 1 and Case2 just for completeness. but case 3 is where we 
might be in need of partitions being built out. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to