Carter Shanklin created HIVE-12682:
--------------------------------------

             Summary: Reducers in dynamic partitioning job spend a lot of time 
running hadoop.conf.Configuration.getOverlay
                 Key: HIVE-12682
                 URL: https://issues.apache.org/jira/browse/HIVE-12682
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 1.2.1
            Reporter: Carter Shanklin
         Attachments: reducer.png

I tested this on Hive 1.2.1 but looks like it's still applicable to 2.0.

I ran this query:
{code}
create table flights (
…
)
PARTITIONED BY (Year int)
CLUSTERED BY (Month)
SORTED BY (DayofMonth) into 12 buckets
STORED AS ORC
TBLPROPERTIES("orc.bloom.filter.columns"="*")
;
{code}

(Taken from here: 
https://github.com/t3rmin4t0r/all-airlines-data/blob/master/ddl/orc.sql)

I profiled just the reduce phase and noticed something odd, the attached graph 
shows where time was spent during the reducer phase.

Problem seems to relate to 
https://github.com/apache/hive/blob/branch-2.0/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L903

/cc [~gopalv]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to