Did you try with hive.optimize.sort.dynamic.partition set to true? Thanks Prasanth
On Wed, May 24, 2017 at 11:07 PM -0700, "Revin Chalil" <rcha...@expedia.com<mailto:rcha...@expedia.com>> wrote: Forgot to mention that we use hive 1.2. From: Revin Chalil [mailto:rcha...@expedia.com] Sent: Wednesday, May 24, 2017 10:53 PM To: user@hive.apache.org Subject: wide ORC partitioned table insert causes "GC Overhead Limit Exceeded" error I get the "GC Overhead Limit Exceeded" and “java heap space” errors when trying to insert from Stage table to wide (~2500 columns) ORC partitioned table with dynamic partitioning. It consistently fails with TEZ and but succeeds with lesser data in “MR” execution engine. We have been trying to adjust the below settings with MR, but it still fails if there is data for more than 1 day. set mapreduce.map.memory.mb=16288; set mapreduce.map.java.opts=-Xmx10192m; set mapreduce.reduce.memory.mb=16288; set mapreduce.reduce.java.opts=-Xmx10192m; The insert statement is with very simple select statement as below. INSERT INTO TABLE table_partitioned (partition_key) SELECT 999999 ,* ,IF(Date<>'None', SUBSTR(Date,1,10), '1970-01-01') AS partition_key FROM stg_table The ORC tale uses Snappy compression. This cluster does not have anything else running. Are there settings to suppress GC / memory footprint for this Insert, so that it may succeed with more data? Thanks, Revin