The query is rather large it won't tell you much (it's generated). It comes down to this: WITH gold AS ( select * from table1), delta AS (select * from table2) INSERT OVERWRITE TABLE my_db.temp__v1_2019_12_03_182627 PARTITION (`c_date`,`c_hour`,`c_b`,`c_p`) SELECT * FROM gold UNION DISTINCT SELECT * FROM delta DISTRIBUTE BY c_date, c_hour, c_b, c_p
We run with this: hive -f /tmp/populateTempTable6388054392973078671.hql --verbose --hiveconf hive.exec.dynamic.partition='true' --hiveconf hive.exec.dynamic.partition.mode='nonstrict' --hiveconf hive.exec.max.dynamic.partitions.pernode='5000' --hiveconf hive.exec.max.dynamic.partitions='50000' --hiveconf parquet.compression='SNAPPY' --hiveconf hive.execution.engine='mr' --hiveconf mapreduce.map.java.opts='-Xmx4608m' --hiveconf mapreduce.map.memory.mb='5760' --hiveconf mapreduce.reduce.java.opts='-Xmx10400m' --hiveconf mapreduce.reduce.memory.mb='13000' --hiveconf hive.optimize.sort.dynamic.partition='false' --hiveconf hive.blobstore.optimizations.enabled='false' --hiveconf hive.map.aggr='false' --hiveconf= yarn.app.mapreduce.am.resource.mb=15000 We run on EMR m5.2xlarge nodes (32GB of memory). As I said the M/R bit runs fine, the job is listed as succeeded in the ResourceManager, after we get the error somehow, Op wo 8 jan. 2020 om 17:22 schreef Suresh Kumar Sethuramaswamy < rock...@gmail.com>: > Could you please post your insert query snippet along with the SET > statements ? > > On Wed, Jan 8, 2020 at 11:17 AM Patrick Duin <patd...@gmail.com> wrote: > >> Hi, >> I got a query that's producing about 3000 partitions which we load >> dynamically (On Hive 2.3.5). >> At the end of this query (running on M/R which runs fine) the M/R job is >> finished and we see this on the hive cli: >> >> Loading data to table my_db.temp__v1_2019_12_03_182627 partition >> (c_date=null, c_hour=null, c_b=null, c_p=null) >> >> >> Time taken to load dynamic partitions: 540.025 seconds >> Time taken for adding to write entity : 0.329 seconds >> # >> # java.lang.OutOfMemoryError: Java heap space >> # -XX:OnOutOfMemoryError="kill -9 %p" >> # Executing /bin/sh -c "kill -9 19644"... >> os::fork_and_exec failed: Cannot allocate memory (12) >> MapReduce Jobs Launched: >> Stage-Stage-1: Map: 387 Reduce: 486 Cumulative CPU: 110521.05 sec >> HDFS Read: 533411354 HDFS Write: 262054898296 SUCCESS >> Stage-Stage-2: Map: 973 Reduce: 1009 Cumulative CPU: 48710.45 sec >> HDFS Read: 262126094987 HDFS Write: 70666472011 SUCCESS >> Total MapReduce CPU Time Spent: 1 days 20 hours 13 minutes 51 seconds 500 >> msec >> OK >> >> Where is this OutOfMemoryError coming from which heap space am I supposed >> to increase. We've tried increasing 'yarn.app.mapreduce.am.resource.mb' but >> that didn't seem to help. >> I know we should probably not have this many partitions but this is a one >> off would like this to just work. >> >> Thanks for any pointers, >> Patrick >> >> >> >>