Carter Shanklin created HIVE-12682: -------------------------------------- Summary: Reducers in dynamic partitioning job spend a lot of time running hadoop.conf.Configuration.getOverlay Key: HIVE-12682 URL: https://issues.apache.org/jira/browse/HIVE-12682 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 1.2.1 Reporter: Carter Shanklin Attachments: reducer.png
I tested this on Hive 1.2.1 but looks like it's still applicable to 2.0. I ran this query: {code} create table flights ( … ) PARTITIONED BY (Year int) CLUSTERED BY (Month) SORTED BY (DayofMonth) into 12 buckets STORED AS ORC TBLPROPERTIES("orc.bloom.filter.columns"="*") ; {code} (Taken from here: https://github.com/t3rmin4t0r/all-airlines-data/blob/master/ddl/orc.sql) I profiled just the reduce phase and noticed something odd, the attached graph shows where time was spent during the reducer phase. Problem seems to relate to https://github.com/apache/hive/blob/branch-2.0/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L903 /cc [~gopalv] -- This message was sent by Atlassian JIRA (v6.3.4#6332)