Hi, I'm running into oom issue trying to do a Union all on a bunch of AVRO files.
The query is something like this: with gold as ( select * from table1 where local_date=2019-01-01), delta ss ( select * from table2 where local_date=2019-01-01) insert overwrite table3 PARTITION ('local_date') select * from gold union distinct select * from delta; UNION ALL works. The data size is in the low gigabytes and I'm running on 6 16 GB Nodes (I've tried larger and set memory settings higher but that just postpones the error). Mappers fail with erros (stacktraces not all the same) 2019-03-11 13:37:22,381 ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.hadoop.io.Text.setCapacity(Text.java:268) at org.apache.hadoop.io.Text.set(Text.java:224) at org.apache.hadoop.io.Text.set(Text.java:214) at org.apache.hadoop.io.Text.<init>(Text.java:93) at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.copyObject(WritableStringObjectInspector.java:36) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:418) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:442) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:428) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:152) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:144) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:121) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:805) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:719) at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:787) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897) at org.apache.hadoop.hive.ql.exec.UnionOperator.process(UnionOperator.java:148) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:148) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:547) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:455) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) I've tried Hive 2.3.2 and Hive 2.3.4, both tez and mr engines. I've tried running with more and less mappers, always hitting oom. I'm running similar query on different (much larger) data without issues so suspect something with the actual data. The table schema is this: c1 string c2 bigint c3 array<map<string,string>> local_date string I've narrowed it down and (not surprisingly) the 3rd column seems to be the cause of the issue, If I remove that the union works again just fine. Anyone has similar experiences? Perhaps any pointers on how to tackle this? Kind regards, Patrick