I get the "GC Overhead Limit Exceeded" and "java heap space" errors when trying 
to insert from Stage table to wide (~2500 columns) ORC partitioned table with 
dynamic partitioning. It consistently fails with TEZ and but succeeds with 
lesser data in "MR" execution engine. We have been trying to adjust the below 
settings with MR, but it still fails if there is data for more than 1 day.

set mapreduce.map.memory.mb=16288;
set mapreduce.map.java.opts=-Xmx10192m;
set mapreduce.reduce.memory.mb=16288;
set mapreduce.reduce.java.opts=-Xmx10192m;

The insert statement is with very simple select statement as below.

INSERT INTO TABLE table_partitioned (partition_key)
  SELECT
   999999
  ,*
  ,IF(Date<>'None', SUBSTR(Date,1,10), '1970-01-01') AS partition_key
  FROM stg_table

The ORC tale uses Snappy compression. This cluster does not have anything else 
running. Are there settings to suppress GC / memory footprint for this Insert, 
so that it may succeed with more data?

Thanks,
Revin

Reply via email to