[jira] [Commented] (HIVE-4551) HCatLoader smallint/tinyint promotions to Int have issues with ORC integration

Sushanth Sowmyan (JIRA) Mon, 13 May 2013 15:57:18 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13656493#comment-13656493
 ]


Sushanth Sowmyan commented on HIVE-4551:
----------------------------------------

Also, a few more notes : 

a) With my patch that fixes this bug, HCatRecordSerDe still is doing the 
promotion, so HCatRecord does have the promoted data when reading off it, so 
promotion is still configurable in the current way. I intend to refactor this 
out in a new patch(details below)
b) Only the HCatSchema has been made "pure" in that it reflects the underlying 
data.

--

My eventual goal, post-bugfix, to clean this up is as follows:

a) HCatRecord and HCatSchema reflect underlying raw data and do no promotions.
b) Introduce a ConversionImpl, which defines various datatype conversion 
functions, which all default to returning the input, and having a config that 
allows a user which conversions are implemented.
c) Introduce a PromotedHCatRecord & PromotedHCatSchema that wrap 
HCatRecord/HCatSchema and use a ConversionImpl.
d) Implement a PigLoaderConversionImpl/PigStorerConversionImpl in 
hcat-pig-adapter, which implements the following: Short->Int promotion, 
Short->Int promotion, Boolean->Int promotion
e) Have HCatLoader/HCatStorer use the promoted versions of 
HCatRecord/HCatSchema which use the PigConversionImpl.
f) Remove the current HCatContext promotion parameters and make them be 
HCatLoader/HCatStorer parameters.

                
> HCatLoader smallint/tinyint promotions to Int have issues with ORC integration
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-4551
>                 URL: https://issues.apache.org/jira/browse/HIVE-4551
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>            Reporter: Sushanth Sowmyan
>            Assignee: Sushanth Sowmyan
>         Attachments: 4551.patch
>
>
> This was initially reported from an e2e test run, with the following E2E test:
> {code}
>                 {
>                         'name' => 'Hadoop_ORC_Write',
>                         'tests' => [
>                                 {
>                                  'num' => 1
>                                 ,'hcat_prep'=>q\
> drop table if exists hadoop_orc;
> create table hadoop_orc (
>             t tinyint,
>             si smallint,
>             i int,
>             b bigint,
>             f float,
>             d double,
>             s string)
>         stored as orc;\
>                                 ,'hadoop' => q\
> jar :FUNCPATH:/testudf.jar org.apache.hcatalog.utils.WriteText -libjars 
> :HCAT_JAR: :THRIFTSERVER: all100k hadoop_orc\,
>                                 ,'result_table' => 'hadoop_orc'
>                                 ,'sql' => q\select * from all100k;\
>                                 ,'floatpostprocess' => 1
>                                 ,'delimiter' => '       '
>                                 },
>                        ],
>                 },
> {code}
> This fails with the following error:
> {code}
> 2013-04-26 00:26:07,437 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
>       at 
> org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
>       at org.apache.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:53)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
>       at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
>       at 
> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1195)
>       at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.ClassCastException: 
> org.apache.hadoop.hive.serde2.io.ByteWritable cannot be cast to 
> org.apache.hadoop.io.IntWritable
>       at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.getPrimitiveJavaObject(WritableIntObjectInspector.java:45)
>       at 
> org.apache.hcatalog.data.HCatRecordSerDe.serializePrimitiveField(HCatRecordSerDe.java:290)
>       at 
> org.apache.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:192)
>       at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
>       at org.apache.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
>       at 
> org.apache.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:203)
>       at 
> org.apache.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63)
>       ... 12 more
> 2013-04-26 00:26:07,440 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
> for the task
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4551) HCatLoader smallint/tinyint promotions to Int have issues with ORC integration

Reply via email to