nsivabalan opened a new pull request #4420:
URL: https://github.com/apache/hudi/pull/4420


   ## What is the purpose of the pull request
   
   At the write client layer, inline and async has a very tighter meaning. 
enabling inline compaction means, both scheduling and execution is inline. Same 
applies to clustering as well. If async config is enabled, both scheduling and 
execution has to be done async and regular writer will not do anything. But any 
scheduling has to be done in a coordinated manner or in other words, there 
should not be any other operation inflight. For eg, if someone has a separate 
process to schedule and execute compaction, they have to ensure no other 
writers are in progress while scheduling. If not, it might lead to data loss. 
Which means, this puts an extra burden for users to configure lock providers. 
But Hudi has the intelligence where scheduling can be done inline, but 
execution can be done async. Just that the configs aren't lined up well. So, 
adding two new configs named `hoodie.compact.schedule.async` and 
`hoodie.clustering.schedule.async`. When enabled, scheduling will happen inline 
by regula
 r writers. And the expectation is that, users will have a separate job to 
execute the already scheduled ones. By this, users don't have to configure lock 
providers. With metadata table, this constraint might change, but if not for 
metadata table, users should have a way to exploit async execution of table 
services. 
   
   to be discussed:
   I have not fixed the deltastreamer part and not added tests yet. 
   - Wanted to hear opinions if the approach is good.
   -Deltastreamer current state of things
   a.  by default compaction is async for MOR table unless explicitly disabled 
via configs. 
   b. if async clustering is enabled, scheduling will happen explicitly(inline) 
and execution will happen in a different thread.
   So, with this patch, wondering if we should leave the deltastreamer flow 
untouched.  
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to