nsivabalan opened a new pull request #4420: URL: https://github.com/apache/hudi/pull/4420
## What is the purpose of the pull request At the write client layer, inline and async has a very tighter meaning. enabling inline compaction means, both scheduling and execution is inline. Same applies to clustering as well. If async config is enabled, both scheduling and execution has to be done async and regular writer will not do anything. But any scheduling has to be done in a coordinated manner or in other words, there should not be any other operation inflight. For eg, if someone has a separate process to schedule and execute compaction, they have to ensure no other writers are in progress while scheduling. If not, it might lead to data loss. Which means, this puts an extra burden for users to configure lock providers. But Hudi has the intelligence where scheduling can be done inline, but execution can be done async. Just that the configs aren't lined up well. So, adding two new configs named `hoodie.compact.schedule.async` and `hoodie.clustering.schedule.async`. When enabled, scheduling will happen inline by regula r writers. And the expectation is that, users will have a separate job to execute the already scheduled ones. By this, users don't have to configure lock providers. With metadata table, this constraint might change, but if not for metadata table, users should have a way to exploit async execution of table services. to be discussed: I have not fixed the deltastreamer part and not added tests yet. - Wanted to hear opinions if the approach is good. -Deltastreamer current state of things a. by default compaction is async for MOR table unless explicitly disabled via configs. b. if async clustering is enabled, scheduling will happen explicitly(inline) and execution will happen in a different thread. So, with this patch, wondering if we should leave the deltastreamer flow untouched. ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org