tandonraghav commented on issue #2151: URL: https://github.com/apache/hudi/issues/2151#issuecomment-706003688
@bvaradar Thanks for the answer. Below is how our set up looks like- - We have Client Level Mongo collections. Write various client Mongo oplogs into one topic. - Write into one Topic in Json Format. - Read from this topic (groupBy clients) and then apply Schema at client Level for each batch read. There can be a mixed of Client json while reading from the Topic but every client will have a specific schema. - Transform and Apply schema for every client before saving into Hudi table. Again, I want to say we save via Spark Dataframe. Do not want to Compact inline due to the volume of records. But we are compacting via HoodieClient in a different Spark job for multiple tables every X min. As, I am not able to find a way using spark DF to run only Compaction. Do you see any issue in saving via DF and concurrently running Compaction via HoodieClient? Very soon, we will be doing the perf for Apache Hudi. Will keep you posted. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
