Hello,

I've just implemented a pipeline based on Apache Flink to synchronize
data between MySQL and Hive (transactional + bucketized) onto HDP
cluster. Flink jobs run on Yarn.
I've used Orc files but without ACID properties.
Then, we've created external tables on these hdfs directories that contain
these delta Orc files.
Then, MERGE INTO queries are executed periodically to merge data into the
Hive target table.
It works pretty well but we want to avoid the use of these Merge queries.
How can I update Orc files directly from my Flink job ?

Thanks,
David

Reply via email to