[
https://issues.apache.org/jira/browse/HIVE-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13656384#comment-13656384
]
Sushanth Sowmyan commented on HIVE-4551:
----------------------------------------
The problem here is that the raw data encapsulated by HCatRecord and HCatSchema
are out of synch, which was one of my worries back in HCATALOG-425 :
https://issues.apache.org/jira/browse/HCATALOG-425?focusedCommentId=13439652&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13439652
Basically, the raw data contained in the smallint/tinyint columns are raw
shorts and bytes, and we try to read it as an Int. In the case of rcfile, the
underlying raw data is also stored as an IntWritable in the cases of smallint
and tinyint, but not so in the case of orc. This leads to the following kind of
calls in the rcfile case, and in the orc case:
RCFILE:
{noformat}
13/05/11 02:56:10 INFO mapreduce.InternalUtil: Initializing
org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe with properties
{transient_lastDdlTime=1368266162, serialization.null.format=\N,
columns=ti,si,i,bi,f,d,b, serialization.format=1,
columns.types=int,int,int,bigint,float,double,boolean}
==> org.apache.hadoop.hive.serde2.lazy.LazyInteger:-3
==>
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int
==> org.apache.hadoop.hive.serde2.lazy.LazyInteger:9001
==>
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int
==> org.apache.hadoop.hive.serde2.lazy.LazyInteger:86400
==>
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int
==> org.apache.hadoop.hive.serde2.lazy.LazyLong:4294967297
==>
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyLongObjectInspector:bigint
==> org.apache.hadoop.hive.serde2.lazy.LazyFloat:34.532
==>
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyFloatObjectInspector:float
==> org.apache.hadoop.hive.serde2.lazy.LazyDouble:2.184239842983489E15
==>
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyDoubleObjectInspector:double
==> org.apache.hadoop.hive.serde2.lazy.LazyBoolean:true
==>
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyBooleanObjectInspector:boolean
==> org.apache.hadoop.hive.serde2.lazy.LazyInteger:0
==>
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int
==> org.apache.hadoop.hive.serde2.lazy.LazyInteger:0
==>
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int
==> org.apache.hadoop.hive.serde2.lazy.LazyInteger:0
==>
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector:int
==> org.apache.hadoop.hive.serde2.lazy.LazyLong:0
==>
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyLongObjectInspector:bigint
==> org.apache.hadoop.hive.serde2.lazy.LazyFloat:0.0
==>
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyFloatObjectInspector:float
==> org.apache.hadoop.hive.serde2.lazy.LazyDouble:0.0
==>
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyDoubleObjectInspector:double
==> org.apache.hadoop.hive.serde2.lazy.LazyBoolean:false
==>
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyBooleanObjectInspector:boolean
{noformat}
ORC:
{noformat}
13/05/11 02:56:16 INFO mapreduce.InternalUtil: Initializing
org.apache.hadoop.hive.ql.io.orc.OrcSerde with properties
{transient_lastDdlTime=1368266162, serialization.null.format=\N,
columns=ti,si,i,bi,f,d,b, serialization.format=1,
columns.types=int,int,int,bigint,float,double,boolean}
==> org.apache.hadoop.hive.serde2.io.ByteWritable:-3
==>
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector:int
13/05/11 02:56:16 WARN mapred.LocalJobRunner: job_local_0003
org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error
converting read value to tuple
at
org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:194)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
at
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: java.lang.ClassCastException:
org.apache.hadoop.hive.serde2.io.ByteWritable cannot be cast to
org.apache.hadoop.io.IntWritable
at
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:45)
at
org.apache.hcatalog.data.HCatRecordSerDe.serializePrimitiveField(HCatRecordSerDe.java:292)
at
org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:192)
at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
at
org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203)
at
org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63)
... 8 more
{noformat}
(There is also an additional bug in how they are read for promotion, assuming
Byte where it's ByteWritable, etc)
> ORC - HCatLoader integration has issues with smallint/tinyint promotions to
> Int
> -------------------------------------------------------------------------------
>
> Key: HIVE-4551
> URL: https://issues.apache.org/jira/browse/HIVE-4551
> Project: Hive
> Issue Type: Bug
> Components: HCatalog
> Reporter: Sushanth Sowmyan
> Assignee: Sushanth Sowmyan
>
> This was initially reported from an e2e test run, with the following E2E test:
> {code}
> {
> 'name' => 'Hadoop_ORC_Write',
> 'tests' => [
> {
> 'num' => 1
> ,'hcat_prep'=>q\
> drop table if exists hadoop_orc;
> create table hadoop_orc (
> t tinyint,
> si smallint,
> i int,
> b bigint,
> f float,
> d double,
> s string)
> stored as orc;\
> ,'hadoop' => q\
> jar :FUNCPATH:/testudf.jar org.apache.hcatalog.utils.WriteText -libjars
> :HCAT_JAR: :THRIFTSERVER: all100k hadoop_orc\,
> ,'result_table' => 'hadoop_orc'
> ,'sql' => q\select * from all100k;\
> ,'floatpostprocess' => 1
> ,'delimiter' => ' '
> },
> ],
> },
> {code}
> This fails with the following error:
> {code}
> 2013-04-26 00:26:07,437 WARN org.apache.hadoop.mapred.Child: Error running
> child
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error
> converting read value to tuple
> at
> org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
> at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
> at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
> at
> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1195)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.ClassCastException:
> org.apache.hadoop.hive.serde2.io.ByteWritable cannot be cast to
> org.apache.hadoop.io.IntWritable
> at
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:45)
> at
> org.apache.hcatalog.data.HCatRecordSerDe.serializePrimitiveField(HCatRecordSerDe.java:290)
> at
> org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:192)
> at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
> at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
> at
> org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203)
> at
> org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63)
> ... 12 more
> 2013-04-26 00:26:07,440 INFO org.apache.hadoop.mapred.Task: Runnning cleanup
> for the task
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira