Yes, I use HDP 2.6.5. Thus I still have to deal with Hive 2. The migration to HDP 3 has been planned but in a couple of months. So, thanks for your reply, I investigate deeper concerning the ACID support for Orc in Hive 2.
Le mar. 12 mars 2019 à 22:51, Alan Gates <alanfga...@gmail.com> a écrit : > That's the old (Hive 2) version of ACID. In the newer version (Hive 3) > there's no update, just insert and delete (update is insert + delete). If > you're working against Hive 2 what you have is what you want. If you're > working against Hive 3 you'll need the newer stuff. > > Alan. > > On Tue, Mar 12, 2019 at 12:24 PM David Morin <morin.david....@gmail.com> > wrote: > >> Thanks Alan. >> Yes, the problem is fact was that this streaming API does not handle >> update and delete. >> I've used native Orc files and the next step I've planned to do is the >> use of ACID support as described here: >> https://orc.apache.org/docs/acid.html >> The INSERT/UPDATE/DELETE seems to be implemented: >> OPERATIONSERIALIZATION >> INSERT 0 >> UPDATE 1 >> DELETE 2 >> Do you think this approach is suitable ? >> >> >> >> Le mar. 12 mars 2019 à 19:30, Alan Gates <alanfga...@gmail.com> a écrit : >> >>> Have you looked at Hive's streaming ingest? >>> https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest >>> It is designed for this case, though it only handles insert (not >>> update), so if you need updates you'd have to do the merge as you are >>> currently doing. >>> >>> Alan. >>> >>> On Mon, Mar 11, 2019 at 2:09 PM David Morin <morin.david....@gmail.com> >>> wrote: >>> >>>> Hello, >>>> >>>> I've just implemented a pipeline based on Apache Flink to synchronize data >>>> between MySQL and Hive (transactional + bucketized) onto HDP cluster. >>>> Flink jobs run on Yarn. >>>> I've used Orc files but without ACID properties. >>>> Then, we've created external tables on these hdfs directories that contain >>>> these delta Orc files. >>>> Then, MERGE INTO queries are executed periodically to merge data into the >>>> Hive target table. >>>> It works pretty well but we want to avoid the use of these Merge queries. >>>> How can I update Orc files directly from my Flink job ? >>>> >>>> Thanks, >>>> David >>>> >>>>