Thanks Suresh, changing the heap was our first guess as well actually. I
think we were on the right track there. Weird thing is that our jobs seems
to now run fine (all partitions are added) despite still giving this error.
Weird but it seems to be ok now.

Thanks for the help.

Op wo 8 jan. 2020 om 19:54 schreef Suresh Kumar Sethuramaswamy <
rock...@gmail.com>:

> Thanks for the Query and the hive options.
>
> Looks like the JVM HEAP space for HIVE CLI is running out of memory as per
> the EMR documentation
> https://aws.amazon.com/premiumsupport/knowledge-center/emr-hive-outofmemoryerror-heap-space/
>
>
>
>
> On Wed, Jan 8, 2020 at 11:38 AM Patrick Duin <patd...@gmail.com> wrote:
>
>> The query is rather large it won't tell you much (it's generated).
>>
>> It comes down to this:
>> WITH gold AS ( select * from table1),
>> delta AS (select * from table2)
>> INSERT OVERWRITE TABLE
>>    my_db.temp__v1_2019_12_03_182627
>> PARTITION (`c_date`,`c_hour`,`c_b`,`c_p`)
>>   SELECT * FROM gold
>>   UNION DISTINCT
>>   SELECT * FROM delta
>>  DISTRIBUTE BY c_date, c_hour, c_b, c_p
>>
>> We run with this:
>>
>>  hive -f /tmp/populateTempTable6388054392973078671.hql --verbose
>> --hiveconf hive.exec.dynamic.partition='true' --hiveconf
>> hive.exec.dynamic.partition.mode='nonstrict' --hiveconf
>> hive.exec.max.dynamic.partitions.pernode='5000' --hiveconf
>> hive.exec.max.dynamic.partitions='50000' --hiveconf
>> parquet.compression='SNAPPY' --hiveconf hive.execution.engine='mr'
>> --hiveconf mapreduce.map.java.opts='-Xmx4608m' --hiveconf
>> mapreduce.map.memory.mb='5760' --hiveconf
>> mapreduce.reduce.java.opts='-Xmx10400m' --hiveconf
>> mapreduce.reduce.memory.mb='13000' --hiveconf
>> hive.optimize.sort.dynamic.partition='false' --hiveconf
>> hive.blobstore.optimizations.enabled='false' --hiveconf
>> hive.map.aggr='false' --hiveconf= yarn.app.mapreduce.am.resource.mb=15000
>>
>> We run on EMR m5.2xlarge nodes (32GB of memory). As I said the M/R bit
>> runs fine, the job is listed as succeeded in the ResourceManager, after we
>> get the error somehow,
>>
>>
>>
>> Op wo 8 jan. 2020 om 17:22 schreef Suresh Kumar Sethuramaswamy <
>> rock...@gmail.com>:
>>
>>> Could you please post your insert query snippet along with the SET
>>> statements ?
>>>
>>> On Wed, Jan 8, 2020 at 11:17 AM Patrick Duin <patd...@gmail.com> wrote:
>>>
>>>> Hi,
>>>> I got a query that's producing about 3000 partitions which we load
>>>> dynamically (On Hive 2.3.5).
>>>> At the end of this query (running on M/R which runs fine) the M/R job
>>>> is finished and we see this on the hive cli:
>>>>
>>>> Loading data to table my_db.temp__v1_2019_12_03_182627 partition
>>>> (c_date=null, c_hour=null, c_b=null, c_p=null)
>>>>
>>>>
>>>>          Time taken to load dynamic partitions: 540.025 seconds
>>>>          Time taken for adding to write entity : 0.329 seconds
>>>> #
>>>> # java.lang.OutOfMemoryError: Java heap space
>>>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>>> #   Executing /bin/sh -c "kill -9 19644"...
>>>> os::fork_and_exec failed: Cannot allocate memory (12)
>>>> MapReduce Jobs Launched:
>>>> Stage-Stage-1: Map: 387  Reduce: 486   Cumulative CPU: 110521.05 sec
>>>> HDFS Read: 533411354 HDFS Write: 262054898296 SUCCESS
>>>> Stage-Stage-2: Map: 973  Reduce: 1009   Cumulative CPU: 48710.45 sec
>>>> HDFS Read: 262126094987 HDFS Write: 70666472011 SUCCESS
>>>> Total MapReduce CPU Time Spent: 1 days 20 hours 13 minutes 51 seconds
>>>> 500 msec
>>>> OK
>>>>
>>>> Where is this OutOfMemoryError coming from which heap space am I
>>>> supposed to increase. We've tried increasing
>>>> 'yarn.app.mapreduce.am.resource.mb' but that didn't seem to help.
>>>> I know we should probably not have this many partitions but this is a
>>>> one off would like this to just work.
>>>>
>>>> Thanks for any pointers,
>>>>  Patrick
>>>>
>>>>
>>>>
>>>>

Reply via email to