[ https://issues.apache.org/jira/browse/HIVE-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13778980#comment-13778980 ]
Ashutosh Chauhan commented on HIVE-5298: ---------------------------------------- Can you provide more detail? I think pathToPartInfo will really be returning partition directory (I think variable oneFile is misnamed there). If so, it seems like for loop will have same # of iterations before and after patch. I don't get from where the perf advantage is coming from. > AvroSerde performance problem caused by HIVE-3833 > ------------------------------------------------- > > Key: HIVE-5298 > URL: https://issues.apache.org/jira/browse/HIVE-5298 > Project: Hive > Issue Type: Improvement > Components: Query Processor > Affects Versions: 0.11.0 > Reporter: Xuefu Zhang > Assignee: Xuefu Zhang > Fix For: 0.13.0 > > Attachments: HIVE-5298.1.patch, HIVE-5298.patch > > > HIVE-3833 fixed the targeted problem and made Hive to use partition-level > metadata to initialize object inspector. In doing that, however, it goes thru > every file under the table to access the partition metadata, which is very > inefficient, especially in case of multiple files per partition. This causes > more problem for AvroSerde because AvroSerde initialization accesses schema, > which is located on file system. As a result, before hive can process any > data, it needs to access every file for a table, which can take long enough > to cause job failure because of lack of job progress. > The improvement can be made so that partition metadata is only access once > per partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira