[ https://issues.apache.org/jira/browse/HIVE-16889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Vary reassigned HIVE-16889: --------------------------------- Assignee: (was: Peter Vary) > Improve Performance Of VARCHAR > ------------------------------ > > Key: HIVE-16889 > URL: https://issues.apache.org/jira/browse/HIVE-16889 > Project: Hive > Issue Type: Improvement > Components: Types > Affects Versions: 2.1.1, 3.0.0 > Reporter: David Mollitor > Priority: Major > > Often times, organizations use tools that create table schemas on the fly and > they specify a VARCHAR column with precision. In these scenarios, > performance suffers even though one could assume performance should be better > since there is pre-existing knowledge about the size of the data and buffers > could be more efficiently setup then in the case where no such knowledge > exists. > Most of the performance seems to be caused by reading a STRING from a file > into a byte buffer, checking the length of the STRING, truncating the STRING > if needed, and then serializing the STRING back into bytes again. > From the code, I have identified several areas where develops left notes > about later improvements. > # org.apache.hadoop.hive.serde2.io.HiveVarcharWritable.enforceMaxLength(int) > # org.apache.hadoop.hive.serde2.lazy.LazyHiveVarchar.init(ByteArrayRef, int, > int) > # > org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getHiveVarchar(Object, > PrimitiveObjectInspector) -- This message was sent by Atlassian Jira (v8.3.4#803005)