tandonraghav edited a comment on issue #2151:
URL: https://github.com/apache/hudi/issues/2151#issuecomment-706003688


   @bvaradar Thanks for the answer. 
   Below is how our set up looks like-
   
   - We have Client Level Mongo collections. Write various client Mongo oplogs 
into one topic.
   - Write into one Topic in Json Format.
   - Read from this topic (groupBy clients) and then apply Schema at client 
Level for each batch read. There can be a mixed of Client json while reading 
from the Topic but every client will have a specific schema.
   
   - Transform and Apply schema for every client before saving into Hudi table.
   
   Again, I want to say we save via Spark Dataframe. Do not want to Compact 
inline due to the volume of records.
   But we are compacting via HoodieClient in a different Spark job for multiple 
tables every X min. As, I am not able to find a way using spark DF to run only 
Compaction.
   
   **Do you see any issue in saving via DF and concurrently running Compaction 
via HoodieClient?**
   
   Very soon, we will be doing the perf for our setup with Hudi. Will keep you 
posted.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to