Thanks for the Query and the hive options. Looks like the JVM HEAP space for HIVE CLI is running out of memory as per the EMR documentation https://aws.amazon.com/premiumsupport/knowledge-center/emr-hive-outofmemoryerror-heap-space/
On Wed, Jan 8, 2020 at 11:38 AM Patrick Duin <patd...@gmail.com> wrote: > The query is rather large it won't tell you much (it's generated). > > It comes down to this: > WITH gold AS ( select * from table1), > delta AS (select * from table2) > INSERT OVERWRITE TABLE > my_db.temp__v1_2019_12_03_182627 > PARTITION (`c_date`,`c_hour`,`c_b`,`c_p`) > SELECT * FROM gold > UNION DISTINCT > SELECT * FROM delta > DISTRIBUTE BY c_date, c_hour, c_b, c_p > > We run with this: > > hive -f /tmp/populateTempTable6388054392973078671.hql --verbose > --hiveconf hive.exec.dynamic.partition='true' --hiveconf > hive.exec.dynamic.partition.mode='nonstrict' --hiveconf > hive.exec.max.dynamic.partitions.pernode='5000' --hiveconf > hive.exec.max.dynamic.partitions='50000' --hiveconf > parquet.compression='SNAPPY' --hiveconf hive.execution.engine='mr' > --hiveconf mapreduce.map.java.opts='-Xmx4608m' --hiveconf > mapreduce.map.memory.mb='5760' --hiveconf > mapreduce.reduce.java.opts='-Xmx10400m' --hiveconf > mapreduce.reduce.memory.mb='13000' --hiveconf > hive.optimize.sort.dynamic.partition='false' --hiveconf > hive.blobstore.optimizations.enabled='false' --hiveconf > hive.map.aggr='false' --hiveconf= yarn.app.mapreduce.am.resource.mb=15000 > > We run on EMR m5.2xlarge nodes (32GB of memory). As I said the M/R bit > runs fine, the job is listed as succeeded in the ResourceManager, after we > get the error somehow, > > > > Op wo 8 jan. 2020 om 17:22 schreef Suresh Kumar Sethuramaswamy < > rock...@gmail.com>: > >> Could you please post your insert query snippet along with the SET >> statements ? >> >> On Wed, Jan 8, 2020 at 11:17 AM Patrick Duin <patd...@gmail.com> wrote: >> >>> Hi, >>> I got a query that's producing about 3000 partitions which we load >>> dynamically (On Hive 2.3.5). >>> At the end of this query (running on M/R which runs fine) the M/R job is >>> finished and we see this on the hive cli: >>> >>> Loading data to table my_db.temp__v1_2019_12_03_182627 partition >>> (c_date=null, c_hour=null, c_b=null, c_p=null) >>> >>> >>> Time taken to load dynamic partitions: 540.025 seconds >>> Time taken for adding to write entity : 0.329 seconds >>> # >>> # java.lang.OutOfMemoryError: Java heap space >>> # -XX:OnOutOfMemoryError="kill -9 %p" >>> # Executing /bin/sh -c "kill -9 19644"... >>> os::fork_and_exec failed: Cannot allocate memory (12) >>> MapReduce Jobs Launched: >>> Stage-Stage-1: Map: 387 Reduce: 486 Cumulative CPU: 110521.05 sec >>> HDFS Read: 533411354 HDFS Write: 262054898296 SUCCESS >>> Stage-Stage-2: Map: 973 Reduce: 1009 Cumulative CPU: 48710.45 sec >>> HDFS Read: 262126094987 HDFS Write: 70666472011 SUCCESS >>> Total MapReduce CPU Time Spent: 1 days 20 hours 13 minutes 51 seconds >>> 500 msec >>> OK >>> >>> Where is this OutOfMemoryError coming from which heap space am I >>> supposed to increase. We've tried increasing >>> 'yarn.app.mapreduce.am.resource.mb' but that didn't seem to help. >>> I know we should probably not have this many partitions but this is a >>> one off would like this to just work. >>> >>> Thanks for any pointers, >>> Patrick >>> >>> >>> >>>