Thanks for the Query and the hive options.

Looks like the JVM HEAP space for HIVE CLI is running out of memory as per
the EMR documentation
https://aws.amazon.com/premiumsupport/knowledge-center/emr-hive-outofmemoryerror-heap-space/




On Wed, Jan 8, 2020 at 11:38 AM Patrick Duin <patd...@gmail.com> wrote:

> The query is rather large it won't tell you much (it's generated).
>
> It comes down to this:
> WITH gold AS ( select * from table1),
> delta AS (select * from table2)
> INSERT OVERWRITE TABLE
>    my_db.temp__v1_2019_12_03_182627
> PARTITION (`c_date`,`c_hour`,`c_b`,`c_p`)
>   SELECT * FROM gold
>   UNION DISTINCT
>   SELECT * FROM delta
>  DISTRIBUTE BY c_date, c_hour, c_b, c_p
>
> We run with this:
>
>  hive -f /tmp/populateTempTable6388054392973078671.hql --verbose
> --hiveconf hive.exec.dynamic.partition='true' --hiveconf
> hive.exec.dynamic.partition.mode='nonstrict' --hiveconf
> hive.exec.max.dynamic.partitions.pernode='5000' --hiveconf
> hive.exec.max.dynamic.partitions='50000' --hiveconf
> parquet.compression='SNAPPY' --hiveconf hive.execution.engine='mr'
> --hiveconf mapreduce.map.java.opts='-Xmx4608m' --hiveconf
> mapreduce.map.memory.mb='5760' --hiveconf
> mapreduce.reduce.java.opts='-Xmx10400m' --hiveconf
> mapreduce.reduce.memory.mb='13000' --hiveconf
> hive.optimize.sort.dynamic.partition='false' --hiveconf
> hive.blobstore.optimizations.enabled='false' --hiveconf
> hive.map.aggr='false' --hiveconf= yarn.app.mapreduce.am.resource.mb=15000
>
> We run on EMR m5.2xlarge nodes (32GB of memory). As I said the M/R bit
> runs fine, the job is listed as succeeded in the ResourceManager, after we
> get the error somehow,
>
>
>
> Op wo 8 jan. 2020 om 17:22 schreef Suresh Kumar Sethuramaswamy <
> rock...@gmail.com>:
>
>> Could you please post your insert query snippet along with the SET
>> statements ?
>>
>> On Wed, Jan 8, 2020 at 11:17 AM Patrick Duin <patd...@gmail.com> wrote:
>>
>>> Hi,
>>> I got a query that's producing about 3000 partitions which we load
>>> dynamically (On Hive 2.3.5).
>>> At the end of this query (running on M/R which runs fine) the M/R job is
>>> finished and we see this on the hive cli:
>>>
>>> Loading data to table my_db.temp__v1_2019_12_03_182627 partition
>>> (c_date=null, c_hour=null, c_b=null, c_p=null)
>>>
>>>
>>>          Time taken to load dynamic partitions: 540.025 seconds
>>>          Time taken for adding to write entity : 0.329 seconds
>>> #
>>> # java.lang.OutOfMemoryError: Java heap space
>>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>> #   Executing /bin/sh -c "kill -9 19644"...
>>> os::fork_and_exec failed: Cannot allocate memory (12)
>>> MapReduce Jobs Launched:
>>> Stage-Stage-1: Map: 387  Reduce: 486   Cumulative CPU: 110521.05 sec
>>> HDFS Read: 533411354 HDFS Write: 262054898296 SUCCESS
>>> Stage-Stage-2: Map: 973  Reduce: 1009   Cumulative CPU: 48710.45 sec
>>> HDFS Read: 262126094987 HDFS Write: 70666472011 SUCCESS
>>> Total MapReduce CPU Time Spent: 1 days 20 hours 13 minutes 51 seconds
>>> 500 msec
>>> OK
>>>
>>> Where is this OutOfMemoryError coming from which heap space am I
>>> supposed to increase. We've tried increasing
>>> 'yarn.app.mapreduce.am.resource.mb' but that didn't seem to help.
>>> I know we should probably not have this many partitions but this is a
>>> one off would like this to just work.
>>>
>>> Thanks for any pointers,
>>>  Patrick
>>>
>>>
>>>
>>>

Reply via email to