Re: How to update Hive ACID tables in Flink

David Morin Tue, 12 Mar 2019 15:11:17 -0700

Yes, I use HDP 2.6.5. Thus I still have to deal with Hive 2.
The migration to HDP 3 has been planned but in a couple of months.
So, thanks for your reply, I investigate deeper concerning the ACID support
for Orc in Hive 2.


Le mar. 12 mars 2019 à 22:51, Alan Gates <alanfga...@gmail.com> a écrit :

> That's the old (Hive 2) version of ACID.  In the newer version (Hive 3)
> there's no update, just insert and delete (update is insert + delete).  If
> you're working against Hive 2 what you have is what you want.  If you're
> working against Hive 3 you'll need the newer stuff.
>
> Alan.
>
> On Tue, Mar 12, 2019 at 12:24 PM David Morin <morin.david....@gmail.com>
> wrote:
>
>> Thanks Alan.
>> Yes, the problem is fact was that this streaming API does not handle
>> update and delete.
>> I've used native Orc files and the next step I've planned to do is the
>> use of ACID support as described here:
>> https://orc.apache.org/docs/acid.html
>> The INSERT/UPDATE/DELETE seems to be implemented:
>> OPERATIONSERIALIZATION
>> INSERT 0
>> UPDATE 1
>> DELETE 2
>> Do you think this approach is suitable ?
>>
>>
>>
>> Le mar. 12 mars 2019 à 19:30, Alan Gates <alanfga...@gmail.com> a écrit :
>>
>>> Have you looked at Hive's streaming ingest?
>>> https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest
>>> It is designed for this case, though it only handles insert (not
>>> update), so if you need updates you'd have to do the merge as you are
>>> currently doing.
>>>
>>> Alan.
>>>
>>> On Mon, Mar 11, 2019 at 2:09 PM David Morin <morin.david....@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I've just implemented a pipeline based on Apache Flink to synchronize data 
>>>> between MySQL and Hive (transactional + bucketized) onto HDP cluster. 
>>>> Flink jobs run on Yarn.
>>>> I've used Orc files but without ACID properties.
>>>> Then, we've created external tables on these hdfs directories that contain
>>>> these delta Orc files.
>>>> Then, MERGE INTO queries are executed periodically to merge data into the
>>>> Hive target table.
>>>> It works pretty well but we want to avoid the use of these Merge queries.
>>>> How can I update Orc files directly from my Flink job ?
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>>

Re: How to update Hive ACID tables in Flink

Reply via email to