[ https://issues.apache.org/jira/browse/HIVE-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13656493#comment-13656493 ]
Sushanth Sowmyan commented on HIVE-4551: ---------------------------------------- Also, a few more notes : a) With my patch that fixes this bug, HCatRecordSerDe still is doing the promotion, so HCatRecord does have the promoted data when reading off it, so promotion is still configurable in the current way. I intend to refactor this out in a new patch(details below) b) Only the HCatSchema has been made "pure" in that it reflects the underlying data. -- My eventual goal, post-bugfix, to clean this up is as follows: a) HCatRecord and HCatSchema reflect underlying raw data and do no promotions. b) Introduce a ConversionImpl, which defines various datatype conversion functions, which all default to returning the input, and having a config that allows a user which conversions are implemented. c) Introduce a PromotedHCatRecord & PromotedHCatSchema that wrap HCatRecord/HCatSchema and use a ConversionImpl. d) Implement a PigLoaderConversionImpl/PigStorerConversionImpl in hcat-pig-adapter, which implements the following: Short->Int promotion, Short->Int promotion, Boolean->Int promotion e) Have HCatLoader/HCatStorer use the promoted versions of HCatRecord/HCatSchema which use the PigConversionImpl. f) Remove the current HCatContext promotion parameters and make them be HCatLoader/HCatStorer parameters. > HCatLoader smallint/tinyint promotions to Int have issues with ORC integration > ------------------------------------------------------------------------------ > > Key: HIVE-4551 > URL: https://issues.apache.org/jira/browse/HIVE-4551 > Project: Hive > Issue Type: Bug > Components: HCatalog > Reporter: Sushanth Sowmyan > Assignee: Sushanth Sowmyan > Attachments: 4551.patch > > > This was initially reported from an e2e test run, with the following E2E test: > {code} > { > 'name' => 'Hadoop_ORC_Write', > 'tests' => [ > { > 'num' => 1 > ,'hcat_prep'=>q\ > drop table if exists hadoop_orc; > create table hadoop_orc ( > t tinyint, > si smallint, > i int, > b bigint, > f float, > d double, > s string) > stored as orc;\ > ,'hadoop' => q\ > jar :FUNCPATH:/testudf.jar org.apache.hcatalog.utils.WriteText -libjars > :HCAT_JAR: :THRIFTSERVER: all100k hadoop_orc\, > ,'result_table' => 'hadoop_orc' > ,'sql' => q\select * from all100k;\ > ,'floatpostprocess' => 1 > ,'delimiter' => ' ' > }, > ], > }, > {code} > This fails with the following error: > {code} > 2013-04-26 00:26:07,437 WARN org.apache.hadoop.mapred.Child: Error running > child > org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error > converting read value to tuple > at > org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76) > at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532) > at > org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1195) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.serde2.io.ByteWritable cannot be cast to > org.apache.hadoop.io.IntWritable > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:45) > at > org.apache.hcatalog.data.HCatRecordSerDe.serializePrimitiveField(HCatRecordSerDe.java:290) > at > org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:192) > at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53) > at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97) > at > org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203) > at > org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63) > ... 12 more > 2013-04-26 00:26:07,440 INFO org.apache.hadoop.mapred.Task: Runnning cleanup > for the task > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira