[ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Harish Butani updated HIVE-6414: -------------------------------- Fix Version/s: (was: 0.14.0) 0.13.0 > ParquetInputFormat provides data values that do not match the object > inspectors > ------------------------------------------------------------------------------- > > Key: HIVE-6414 > URL: https://issues.apache.org/jira/browse/HIVE-6414 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Affects Versions: 0.13.0 > Reporter: Remus Rusanu > Assignee: Justin Coffey > Labels: Parquet > Fix For: 0.13.0 > > Attachments: HIVE-6414.2.patch, HIVE-6414.3.patch, HIVE-6414.3.patch, > HIVE-6414.3.patch, HIVE-6414.patch > > > While working on HIVE-5998 I noticed that the ParquetRecordReader returns > IntWritable for all 'int like' types, in disaccord with the row object > inspectors. I though fine, and I worked my way around it. But I see now that > the issue trigger failuers in other places, eg. in aggregates: > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > {"cint":528534767,"ctinyint":31,"csmallint":4963,"cfloat":31.0,"cdouble":4963.0,"cstring1":"cvLH6Eat2yFsyy7p"} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) > ... 8 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast > to java.lang.Short > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) > ... 9 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable > cannot be cast to java.lang.Short > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803) > ... 15 more > {noformat} > My test is (I'm writing a test .q from HIVE-5998, but the repro does not > involve vectorization): > {noformat} > create table if not exists alltypes_parquet ( > cint int, > ctinyint tinyint, > csmallint smallint, > cfloat float, > cdouble double, > cstring1 string) stored as parquet; > insert overwrite table alltypes_parquet > select cint, > ctinyint, > csmallint, > cfloat, > cdouble, > cstring1 > from alltypesorc; > explain select * from alltypes_parquet limit 10; select * from > alltypes_parquet limit 10; > explain select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)