Re: How to update Hive ACID tables in Flink

2019-03-12 Thread David Morin
Yes, I use HDP 2.6.5. Thus I still have to deal with Hive 2. The migration to HDP 3 has been planned but in a couple of months. So, thanks for your reply, I investigate deeper concerning the ACID support for Orc in Hive 2. Le mar. 12 mars 2019 à 22:51, Alan Gates a écrit : > That's the old (Hive

Re: How to update Hive ACID tables in Flink

2019-03-12 Thread Alan Gates
That's the old (Hive 2) version of ACID. In the newer version (Hive 3) there's no update, just insert and delete (update is insert + delete). If you're working against Hive 2 what you have is what you want. If you're working against Hive 3 you'll need the newer stuff. Alan. On Tue, Mar 12, 201

Re: How to update Hive ACID tables in Flink

2019-03-12 Thread David Morin
Thanks Alan. Yes, the problem is fact was that this streaming API does not handle update and delete. I've used native Orc files and the next step I've planned to do is the use of ACID support as described here: https://orc.apache.org/docs/acid.html The INSERT/UPDATE/DELETE seems to be implemented:

Re: Read Hive ACID tables in Spark or Pig

2019-03-12 Thread Alan Gates
If you want to read those tables directly in something other than Hive, yes, you need to get the valid writeid list for each table you're reading from the metastore. If you want to avoid merging data in, take a look at Hive's streaming ingest, which allows you to ingest data into Hive without merg

Re: How to update Hive ACID tables in Flink

2019-03-12 Thread Alan Gates
Have you looked at Hive's streaming ingest? https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest It is designed for this case, though it only handles insert (not update), so if you need updates you'd have to do the merge as you are currently doing. Alan. On Mon, Mar 11, 2019 at

Re: out of memory using Union operator and array column type

2019-03-12 Thread Patrick Duin
set hive.map.aggr=false; Worked for me. Slow and steady wins the race :) Many thanks all! Patrick Op di 12 mrt. 2019 om 03:23 schreef Gopal Vijayaraghavan : > > > I'll try the simplest query I can reduce it to with loads of memory and > see if that gets anywhere. Other pointers are much appre

Re: Running Hive on Spark

2019-03-12 Thread Daniel Mateus Pires
Hi Rajesh, I'm trying to further my understanding of the various interactions and set-ups for Hive + Spark My understanding so far is that running queries against the SparkThriftServer uses the SparkSQL engine whereas the HiveServer2 + Hive + Spark execution engine uses Hive primitives and only u