That's awesome. Thanks Suresh Sethuramaswamy
On Thu, Jan 9, 2020 at 2:00 PM Patrick Duin <patd...@gmail.com> wrote: > Thanks Suresh, changing the heap was our first guess as well actually. I > think we were on the right track there. Weird thing is that our jobs seems > to now run fine (all partitions are added) despite still giving this error. > Weird but it seems to be ok now. > > Thanks for the help. > > Op wo 8 jan. 2020 om 19:54 schreef Suresh Kumar Sethuramaswamy < > rock...@gmail.com>: > >> Thanks for the Query and the hive options. >> >> Looks like the JVM HEAP space for HIVE CLI is running out of memory as >> per the EMR documentation >> https://aws.amazon.com/premiumsupport/knowledge-center/emr-hive-outofmemoryerror-heap-space/ >> >> >> >> >> On Wed, Jan 8, 2020 at 11:38 AM Patrick Duin <patd...@gmail.com> wrote: >> >>> The query is rather large it won't tell you much (it's generated). >>> >>> It comes down to this: >>> WITH gold AS ( select * from table1), >>> delta AS (select * from table2) >>> INSERT OVERWRITE TABLE >>> my_db.temp__v1_2019_12_03_182627 >>> PARTITION (`c_date`,`c_hour`,`c_b`,`c_p`) >>> SELECT * FROM gold >>> UNION DISTINCT >>> SELECT * FROM delta >>> DISTRIBUTE BY c_date, c_hour, c_b, c_p >>> >>> We run with this: >>> >>> hive -f /tmp/populateTempTable6388054392973078671.hql --verbose >>> --hiveconf hive.exec.dynamic.partition='true' --hiveconf >>> hive.exec.dynamic.partition.mode='nonstrict' --hiveconf >>> hive.exec.max.dynamic.partitions.pernode='5000' --hiveconf >>> hive.exec.max.dynamic.partitions='50000' --hiveconf >>> parquet.compression='SNAPPY' --hiveconf hive.execution.engine='mr' >>> --hiveconf mapreduce.map.java.opts='-Xmx4608m' --hiveconf >>> mapreduce.map.memory.mb='5760' --hiveconf >>> mapreduce.reduce.java.opts='-Xmx10400m' --hiveconf >>> mapreduce.reduce.memory.mb='13000' --hiveconf >>> hive.optimize.sort.dynamic.partition='false' --hiveconf >>> hive.blobstore.optimizations.enabled='false' --hiveconf >>> hive.map.aggr='false' --hiveconf= yarn.app.mapreduce.am.resource.mb=15000 >>> >>> We run on EMR m5.2xlarge nodes (32GB of memory). As I said the M/R bit >>> runs fine, the job is listed as succeeded in the ResourceManager, after we >>> get the error somehow, >>> >>> >>> >>> Op wo 8 jan. 2020 om 17:22 schreef Suresh Kumar Sethuramaswamy < >>> rock...@gmail.com>: >>> >>>> Could you please post your insert query snippet along with the SET >>>> statements ? >>>> >>>> On Wed, Jan 8, 2020 at 11:17 AM Patrick Duin <patd...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> I got a query that's producing about 3000 partitions which we load >>>>> dynamically (On Hive 2.3.5). >>>>> At the end of this query (running on M/R which runs fine) the M/R job >>>>> is finished and we see this on the hive cli: >>>>> >>>>> Loading data to table my_db.temp__v1_2019_12_03_182627 partition >>>>> (c_date=null, c_hour=null, c_b=null, c_p=null) >>>>> >>>>> >>>>> Time taken to load dynamic partitions: 540.025 seconds >>>>> Time taken for adding to write entity : 0.329 seconds >>>>> # >>>>> # java.lang.OutOfMemoryError: Java heap space >>>>> # -XX:OnOutOfMemoryError="kill -9 %p" >>>>> # Executing /bin/sh -c "kill -9 19644"... >>>>> os::fork_and_exec failed: Cannot allocate memory (12) >>>>> MapReduce Jobs Launched: >>>>> Stage-Stage-1: Map: 387 Reduce: 486 Cumulative CPU: 110521.05 sec >>>>> HDFS Read: 533411354 HDFS Write: 262054898296 SUCCESS >>>>> Stage-Stage-2: Map: 973 Reduce: 1009 Cumulative CPU: 48710.45 sec >>>>> HDFS Read: 262126094987 HDFS Write: 70666472011 SUCCESS >>>>> Total MapReduce CPU Time Spent: 1 days 20 hours 13 minutes 51 seconds >>>>> 500 msec >>>>> OK >>>>> >>>>> Where is this OutOfMemoryError coming from which heap space am I >>>>> supposed to increase. We've tried increasing >>>>> 'yarn.app.mapreduce.am.resource.mb' but that didn't seem to help. >>>>> I know we should probably not have this many partitions but this is a >>>>> one off would like this to just work. >>>>> >>>>> Thanks for any pointers, >>>>> Patrick >>>>> >>>>> >>>>> >>>>>