[ https://issues.apache.org/jira/browse/HIVE-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashutosh Chauhan updated HIVE-5298: ----------------------------------- Status: Open (was: Patch Available) Canceling patch till we have better understanding of problem. > AvroSerde performance problem caused by HIVE-3833 > ------------------------------------------------- > > Key: HIVE-5298 > URL: https://issues.apache.org/jira/browse/HIVE-5298 > Project: Hive > Issue Type: Improvement > Components: Query Processor > Affects Versions: 0.11.0 > Reporter: Xuefu Zhang > Assignee: Xuefu Zhang > Attachments: HIVE-5298.1.patch, HIVE-5298.patch > > > HIVE-3833 fixed the targeted problem and made Hive to use partition-level > metadata to initialize object inspector. In doing that, however, it goes thru > every file under the table to access the partition metadata, which is very > inefficient, especially in case of multiple files per partition. This causes > more problem for AvroSerde because AvroSerde initialization accesses schema, > which is located on file system. As a result, before hive can process any > data, it needs to access every file for a table, which can take long enough > to cause job failure because of lack of job progress. > The improvement can be made so that partition metadata is only access once > per partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)