Re: out of memory using Union operator and array column type

2019-03-12 Thread Patrick Duin
set hive.map.aggr=false; Worked for me. Slow and steady wins the race :) Many thanks all! Patrick Op di 12 mrt. 2019 om 03:23 schreef Gopal Vijayaraghavan : > > > I'll try the simplest query I can reduce it to with loads of memory and > see if that gets anywhere. Other pointers are much appre

Re: out of memory using Union operator and array column type

2019-03-11 Thread Gopal Vijayaraghavan
> I'll try the simplest query I can reduce it to  with loads of memory and see > if that gets anywhere. Other pointers are much appreciated. Looks like something I'm testing right now (to make the memory setting cost-based). https://issues.apache.org/jira/browse/HIVE-21399 A less "cost-based

Re: out of memory using Union operator and array column type

2019-03-11 Thread Devopam Mittra
hi Patrick, Usually a distinct is preferred on Primary key columns instead of the entire table - something typically addressed to as SKEWNESS in traditional rdbms world. Doing it on an array will further add to the woes typically. A typical workaround for this done by me in past is to fall back to

Re: out of memory using Union operator and array column type

2019-03-11 Thread Patrick Duin
Venkatesh: Increasing the memory: I've tried even bigger setttings, that made the error appear after twice much more time. Dev: So I know which table is giving the issue, following your previous suggestion I did a SELECT DISTINCT * FROM DELTA, which cause the same issue so I think the DISTINCT is

Re: out of memory using Union operator and array column type

2019-03-11 Thread Devopam Mittra
hi Patrick, If it sounds worth trying please do the same: 1. Create physical table from table 1. (with filter clause) 2. Create physical table from table 2. (with filter clause) 3. Create interim table 2_1 with the DISTINCT clause. 4. Create interim table 2_2 with the UNION clause. 5. Do an INSERT

Re: out of memory using Union operator and array column type

2019-03-11 Thread Venkatesh Selvaraj
Patrick, Can you bump up the mapper memory and see if it helps? SET mapreduce.map.memory.mb=3072 SET mapreduce.map.java.opts=-Xmx2560m; Regards, Venkatesh On Mon, Mar 11, 2019 at 7:29 AM Patrick Duin wrote: > Hi, > > I'm running into oom issue trying to do a Union all on a bunch of AVRO > fi

Re: out of memory using Union operator and array column type

2019-03-11 Thread Patrick Duin
Very good question, Yes that does give the same problem. Op ma 11 mrt. 2019 om 16:28 schreef Devopam Mittra : > Can you please try doing SELECT DISTINCT * FROM DELTA into a physical > table first ? > regards > Dev > > > On Mon, Mar 11, 2019 at 7:59 PM Patrick Duin wrote: > >> Hi, >> >> I'm runni

Re: out of memory using Union operator and array column type

2019-03-11 Thread Devopam Mittra
Can you please try doing SELECT DISTINCT * FROM DELTA into a physical table first ? regards Dev On Mon, Mar 11, 2019 at 7:59 PM Patrick Duin wrote: > Hi, > > I'm running into oom issue trying to do a Union all on a bunch of AVRO > files. > > The query is something like this: > > with gold as (

out of memory using Union operator and array column type

2019-03-11 Thread Patrick Duin
Hi, I'm running into oom issue trying to do a Union all on a bunch of AVRO files. The query is something like this: with gold as ( select * from table1 where local_date=2019-01-01), delta ss ( select * from table2 where local_date=2019-01-01) insert overwrite table3 PARTITION ('local_date'