Can you please try doing SELECT DISTINCT * FROM DELTA into a physical table
first ?
regards
Dev


On Mon, Mar 11, 2019 at 7:59 PM Patrick Duin <patd...@gmail.com> wrote:

> Hi,
>
> I'm running into oom issue trying to do a Union all on a bunch of AVRO
> files.
>
> The query is something like this:
>
> with gold  as ( select * from table1 where local_date=2019-01-01),
>      delta ss ( select * from table2 where local_date=2019-01-01)
> insert overwrite table3 PARTITION ('local_date')
> select * from gold
> union distinct
> select * from delta;
>
> UNION ALL works. The data size is in the low gigabytes and I'm running on
> 6 16 GB Nodes (I've tried larger and set memory settings higher but that
> just postpones the error).
>
> Mappers fail with erros (stacktraces not all the same)
>
> 2019-03-11 13:37:22,381 ERROR [main] org.apache.hadoop.mapred.YarnChild: 
> Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
>       at org.apache.hadoop.io.Text.setCapacity(Text.java:268)
>       at org.apache.hadoop.io.Text.set(Text.java:224)
>       at org.apache.hadoop.io.Text.set(Text.java:214)
>       at org.apache.hadoop.io.Text.<init>(Text.java:93)
>       at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.copyObject(WritableStringObjectInspector.java:36)
>       at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:418)
>       at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:442)
>       at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:428)
>       at 
> org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:152)
>       at 
> org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:144)
>       at 
> org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:121)
>       at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:805)
>       at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:719)
>       at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:787)
>       at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>       at 
> org.apache.hadoop.hive.ql.exec.UnionOperator.process(UnionOperator.java:148)
>       at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>       at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>       at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>       at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
>       at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:148)
>       at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:547)
>       at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:455)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
>
> I've tried Hive 2.3.2 and Hive 2.3.4, both tez and mr engines.
>
> I've tried running with more and less mappers, always hitting oom.
>
> I'm running similar query on different (much larger) data without issues so 
> suspect something with the actual data.
>
> The table schema is this:
> c1    string                  
> c2    bigint                  
> c3    array<map<string,string>>       
> local_date  string
>
>
> I've narrowed it down and (not surprisingly) the 3rd column seems to be the 
> cause of the issue, If I remove that the union works again just fine.
>
> Anyone has similar experiences? Perhaps any pointers on how to tackle this?
>
> Kind regards,
>
>  Patrick
>
>
>

-- 
Devopam Mittra
Life and Relations are not binary

Reply via email to