[ https://issues.apache.org/jira/browse/HUDI-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sagar Sumit updated HUDI-2488: ------------------------------ Issue Type: Epic (was: New Feature) > Support async metadata index creation while regular writers and table > services are in progress > ---------------------------------------------------------------------------------------------- > > Key: HUDI-2488 > URL: https://issues.apache.org/jira/browse/HUDI-2488 > Project: Apache Hudi > Issue Type: Epic > Reporter: sivabalan narayanan > Assignee: Sagar Sumit > Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > Attachments: image-2021-11-17-11-04-09-713.png > > > For now, we have only FILES partition in metadata table. and our suggestion > is to stop all processes and then restart one by one by enabling metadata > table. first process to start back will invoke bootstrapping of the metadata > table. > > But this may not work out well as we add more and more partitions to metadata > table. > We need to support bootstrapping a single or more partitions in metadata > table while regular writers and table services are in progress. > > > Penning down my thoughts/idea. > I tried to find a way to get this done w/o adding an additional lock, but > could not crack that. So, here is one way to support async bootstrap. > > Introducing a file called "available_partitions" in some special file under > metadata table. This file will contain the list of partitions that are > available to apply updates from data table. i.e. when we do synchronous > updates from data table to metadata table, when we have N no of partitions in > metadata table, we need to know what partitions are fully bootstrapped and > ready to take updates. this file will assist in maintaining that info. We can > debate on how to maintain this info (tbl props, or separate file etc, but for > now let's say this file is the source of truth). Idea here is that, any async > bootstrap process will update this file with the new partition that got > bootstrapped once the bootstrap is fully complete. So that all other writers > will know what partitions to update. > Add we need to introduce a metadata_lock as well. > > here is how writers and async bootstrap will pan out. > > Regular writer or any async table service(compaction, etc): > when changes are required to be applied to metadata table: // fyi. as of > today this already happens within data table lock. > Take metadata_lock > read contents of available_partitions. > prep records and apply updates to metadata table. > release lock. > > Async bootstrap process: > Start bootstrapping of a given partition (eg files) in metadata table. > do it in a loop. i.e. first iteration of bootstrap could take 10 mins > for eg. and then again catch up new commits that happened in the last 10 mins > which could take 1 min for instance. and then again go for another loop. > Whenever total bootstrap time for a round is ~ 1min or less, in the next > round, we can go in for final iteration. > During the final iteration, take the metadata_lock. // this lock > should not be held for more than few secs. > apply any new commits that happened while last iteration > of bootstrap was happening. > update "available_partitions" file with this partition > info that got fully bootstrapped. > release lock. > > metadata_lock: will ensure when async bootstrap is in final stages of > bootstrapping, we should not miss any commits that is nearing completion. So, > we ought to take a lock to ensure we don't miss out on any commits. Either > async bootstrap will apply the update, or the actual writer itself will > update directly if bootstrap is fully complete. > > Rgdn "available_partitions": > I was looking for a way to know what partitions are fully ready to take in > direct updates from regular writers and hence chose this way. We can also > think about creating a temp_partition(files_temp or something) while > bootstrap in progress and then rename to original partition name once > bootstrap is fully complete. If we can ensure reliably renaming of these > partitions(i.e, once files partition is available, it is fully ready to take > in direct updates), we can take this route as well. > Here is how it might pan out w/ folder/partition renaming. > > Regular writer or any async table service(compaction, etc): > when changes are required to be applied to metadata table: // fyi. as of > today this already happens within data table lock. > Take metadata_lock > list partitions in metadata table. ignore temp partitions. > prep records and apply updates to metadata table. > release lock. > > Async bootstrap process: > Start bootstrapping of a given partition (eg files) in metadata table. > create a temp folder for partition thats getting bootstrapped. (for eg: > files_temp) > do it in a loop. i.e. first iteration of bootstrap could take 10 mins > for eg. and then again catch up new commits that happened in the last 10 mins > which could take 1 min for instance. and then again go for another loop. > Whenever total bootstrap time for a round is ~ 1min or less, in the next > round, we can go in for final iteration. > During the final iteration, take the metadata_lock. // this lock > should not be held for more than few secs. > apply any new commits that happened while last iteration > of bootstrap was happening. > rename files_temp to files. > release lock. > Note: we just need to ensure that folder renaming is consistent. On crash, > either new folder is fully intact or not available. contents of old folder > does not matter. > > Failures: > a. if bootstrap failed midway, until "files" hasn't been created, we can > delete files_temp and start all over again. > b. if bootstrap failed just after rename, again we should be good. Just that > lock may not have been released. We need to ensure the metadata lock is > released. So, to tackle this, if acquiring metadata_lock from regular writer > fails, we will just proceed onto listing partitions and applying updates. > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)