[GitHub] [hudi] bvaradar edited a comment on issue #2151: [SUPPORT] How to run Periodic Compaction? Multiple Tables - When no Upserts

GitBox Fri, 09 Oct 2020 01:04:24 -0700


bvaradar edited a comment on issue #2151:
URL: https://github.com/apache/hudi/issues/2151#issuecomment-706029681



   @tandonraghav : It should work as is with 0.6.0. you should be able to run 
spark.write() with inline compaction off. Based on compaction schedule, this 
write will schedule compactions. You can then use your writeClient code to run 
async compactions.
   
   Just so that you are made aware of all things : Note that inline compaction 
does not need to run every single time you are ingesting data. You can set it 
to run every N commits but it will be inline when it runs (blocks writing)
   
   We usually have folks running async compaction in delta-streamer continuous 
mode and in structured streaming (recently).  Async compaction in spark DF 
write or in deltastreamer run-once mode is generally not done as users need to 
setup separate compaction job. Let me open a jira to run compaction alone using 
spark.write() to make it easier...
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] bvaradar edited a comment on issue #2151: [SUPPORT] How to run Periodic Compaction? Multiple Tables - When no Upserts

Reply via email to