The query is rather large it won't tell you much (it's generated).

It comes down to this:
WITH gold AS ( select * from table1),
delta AS (select * from table2)
INSERT OVERWRITE TABLE
   my_db.temp__v1_2019_12_03_182627
PARTITION (`c_date`,`c_hour`,`c_b`,`c_p`)
  SELECT * FROM gold
  UNION DISTINCT
  SELECT * FROM delta
 DISTRIBUTE BY c_date, c_hour, c_b, c_p

We run with this:

 hive -f /tmp/populateTempTable6388054392973078671.hql --verbose --hiveconf
hive.exec.dynamic.partition='true' --hiveconf
hive.exec.dynamic.partition.mode='nonstrict' --hiveconf
hive.exec.max.dynamic.partitions.pernode='5000' --hiveconf
hive.exec.max.dynamic.partitions='50000' --hiveconf
parquet.compression='SNAPPY' --hiveconf hive.execution.engine='mr'
--hiveconf mapreduce.map.java.opts='-Xmx4608m' --hiveconf
mapreduce.map.memory.mb='5760' --hiveconf
mapreduce.reduce.java.opts='-Xmx10400m' --hiveconf
mapreduce.reduce.memory.mb='13000' --hiveconf
hive.optimize.sort.dynamic.partition='false' --hiveconf
hive.blobstore.optimizations.enabled='false' --hiveconf
hive.map.aggr='false' --hiveconf= yarn.app.mapreduce.am.resource.mb=15000

We run on EMR m5.2xlarge nodes (32GB of memory). As I said the M/R bit runs
fine, the job is listed as succeeded in the ResourceManager, after we get
the error somehow,



Op wo 8 jan. 2020 om 17:22 schreef Suresh Kumar Sethuramaswamy <
rock...@gmail.com>:

> Could you please post your insert query snippet along with the SET
> statements ?
>
> On Wed, Jan 8, 2020 at 11:17 AM Patrick Duin <patd...@gmail.com> wrote:
>
>> Hi,
>> I got a query that's producing about 3000 partitions which we load
>> dynamically (On Hive 2.3.5).
>> At the end of this query (running on M/R which runs fine) the M/R job is
>> finished and we see this on the hive cli:
>>
>> Loading data to table my_db.temp__v1_2019_12_03_182627 partition
>> (c_date=null, c_hour=null, c_b=null, c_p=null)
>>
>>
>>          Time taken to load dynamic partitions: 540.025 seconds
>>          Time taken for adding to write entity : 0.329 seconds
>> #
>> # java.lang.OutOfMemoryError: Java heap space
>> # -XX:OnOutOfMemoryError="kill -9 %p"
>> #   Executing /bin/sh -c "kill -9 19644"...
>> os::fork_and_exec failed: Cannot allocate memory (12)
>> MapReduce Jobs Launched:
>> Stage-Stage-1: Map: 387  Reduce: 486   Cumulative CPU: 110521.05 sec
>> HDFS Read: 533411354 HDFS Write: 262054898296 SUCCESS
>> Stage-Stage-2: Map: 973  Reduce: 1009   Cumulative CPU: 48710.45 sec
>> HDFS Read: 262126094987 HDFS Write: 70666472011 SUCCESS
>> Total MapReduce CPU Time Spent: 1 days 20 hours 13 minutes 51 seconds 500
>> msec
>> OK
>>
>> Where is this OutOfMemoryError coming from which heap space am I supposed
>> to increase. We've tried increasing 'yarn.app.mapreduce.am.resource.mb' but
>> that didn't seem to help.
>> I know we should probably not have this many partitions but this is a one
>> off would like this to just work.
>>
>> Thanks for any pointers,
>>  Patrick
>>
>>
>>
>>

Reply via email to