[ 
https://issues.apache.org/jira/browse/HIVE-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5298:
------------------------

    Fix Version/s:     (was: 0.13.0)
                   0.14.0

> AvroSerde performance problem caused by HIVE-3833
> -------------------------------------------------
>
>                 Key: HIVE-5298
>                 URL: https://issues.apache.org/jira/browse/HIVE-5298
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.11.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>             Fix For: 0.14.0
>
>         Attachments: HIVE-5298.1.patch, HIVE-5298.patch
>
>
> HIVE-3833 fixed the targeted problem and made Hive to use partition-level 
> metadata to initialize object inspector. In doing that, however, it goes thru 
> every file under the table to access the partition metadata, which is very 
> inefficient, especially in case of multiple files per partition. This causes 
> more problem for AvroSerde because AvroSerde initialization accesses schema, 
> which is located on file system. As a result, before hive can process any 
> data, it needs to access every file for a table, which can take long enough 
> to cause job failure because of lack of job progress.
> The improvement can be made so that partition metadata is only access once 
> per partition.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to