[jira] [Created] (HIVE-16889) Improve Performance Of VARCHAR

BELUGA BEHR (JIRA) Tue, 13 Jun 2017 08:17:37 -0700

BELUGA BEHR created HIVE-16889:
----------------------------------

             Summary: Improve Performance Of VARCHAR
                 Key: HIVE-16889
                 URL: https://issues.apache.org/jira/browse/HIVE-16889
             Project: Hive
          Issue Type: Improvement
          Components: Types
    Affects Versions: 2.1.1, 3.0.0
            Reporter: BELUGA BEHR



Often times, organizations use tools that create table schemas on the fly and 
they specify a  VARCHAR column with precision.  In these scenarios, performance 
suffers even though one could assume performance should be better since there 
is pre-existing knowledge about the size of the data and buffers could be more 
efficiently setup then in the case where no such knowledge exists.

Most of the performance seems to be caused by reading a STRING from a file into 
a byte buffer, checking the length of the STRING, truncating the STRING if 
needed, and then serializing the STRING back into bytes again.

>From the code, I have identified several areas where develops left notes about 
>later improvements.

# org.apache.hadoop.hive.serde2.io.HiveVarcharWritable.enforceMaxLength(int)
# org.apache.hadoop.hive.serde2.lazy.LazyHiveVarchar.init(ByteArrayRef, int, 
int)
# 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getHiveVarchar(Object,
 PrimitiveObjectInspector)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HIVE-16889) Improve Performance Of VARCHAR

Reply via email to