Hello, I've just implemented a pipeline based on Apache Flink to synchronize data between MySQL and Hive (transactional + bucketized) onto HDP cluster. Flink jobs run on Yarn. I've used Orc files but without ACID properties. Then, we've created external tables on these hdfs directories that contain these delta Orc files. Then, MERGE INTO queries are executed periodically to merge data into the Hive target table. It works pretty well but we want to avoid the use of these Merge queries. How can I update Orc files directly from my Flink job ?
Thanks, David