Thanks Alan. Yes, the problem is fact was that this streaming API does not handle update and delete. I've used native Orc files and the next step I've planned to do is the use of ACID support as described here: https://orc.apache.org/docs/acid.html The INSERT/UPDATE/DELETE seems to be implemented: OPERATIONSERIALIZATION INSERT 0 UPDATE 1 DELETE 2 Do you think this approach is suitable ?
Le mar. 12 mars 2019 à 19:30, Alan Gates <alanfga...@gmail.com> a écrit : > Have you looked at Hive's streaming ingest? > https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest > It is designed for this case, though it only handles insert (not update), > so if you need updates you'd have to do the merge as you are currently > doing. > > Alan. > > On Mon, Mar 11, 2019 at 2:09 PM David Morin <morin.david....@gmail.com> > wrote: > >> Hello, >> >> I've just implemented a pipeline based on Apache Flink to synchronize data >> between MySQL and Hive (transactional + bucketized) onto HDP cluster. Flink >> jobs run on Yarn. >> I've used Orc files but without ACID properties. >> Then, we've created external tables on these hdfs directories that contain >> these delta Orc files. >> Then, MERGE INTO queries are executed periodically to merge data into the >> Hive target table. >> It works pretty well but we want to avoid the use of these Merge queries. >> How can I update Orc files directly from my Flink job ? >> >> Thanks, >> David >> >>