[ 
https://issues.apache.org/jira/browse/HIVE-16889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-16889:
---------------------------------

    Assignee:     (was: Peter Vary)

> Improve Performance Of VARCHAR
> ------------------------------
>
>                 Key: HIVE-16889
>                 URL: https://issues.apache.org/jira/browse/HIVE-16889
>             Project: Hive
>          Issue Type: Improvement
>          Components: Types
>    Affects Versions: 2.1.1, 3.0.0
>            Reporter: David Mollitor
>            Priority: Major
>
> Often times, organizations use tools that create table schemas on the fly and 
> they specify a  VARCHAR column with precision.  In these scenarios, 
> performance suffers even though one could assume performance should be better 
> since there is pre-existing knowledge about the size of the data and buffers 
> could be more efficiently setup then in the case where no such knowledge 
> exists.
> Most of the performance seems to be caused by reading a STRING from a file 
> into a byte buffer, checking the length of the STRING, truncating the STRING 
> if needed, and then serializing the STRING back into bytes again.
> From the code, I have identified several areas where develops left notes 
> about later improvements.
> # org.apache.hadoop.hive.serde2.io.HiveVarcharWritable.enforceMaxLength(int)
> # org.apache.hadoop.hive.serde2.lazy.LazyHiveVarchar.init(ByteArrayRef, int, 
> int)
> # 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getHiveVarchar(Object,
>  PrimitiveObjectInspector)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to