[ https://issues.apache.org/jira/browse/HIVE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Justin Coffey updated HIVE-6414: -------------------------------- Fix Version/s: 0.13.0 Affects Version/s: 0.13.0 Status: Patch Available (was: Open) the patch was developed against this commit: b05004a863b09cbe5f4b734c5474092f328f0c41 unit tests and qtests run fine against this commit. the latest commit (as of today): 1a3608d8b1f8cf41e9ba2fc7e9bacdecf271bb92 Appears to have broken qtests (none will run) and so I can't verify the patch specific qtest. Unit tests, however, execute without error. > ParquetInputFormat provides data values that do not match the object > inspectors > ------------------------------------------------------------------------------- > > Key: HIVE-6414 > URL: https://issues.apache.org/jira/browse/HIVE-6414 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Affects Versions: 0.13.0 > Reporter: Remus Rusanu > Assignee: Justin Coffey > Labels: Parquet > Fix For: 0.13.0 > > > While working on HIVE-5998 I noticed that the ParquetRecordReader returns > IntWritable for all 'int like' types, in disaccord with the row object > inspectors. I though fine, and I worked my way around it. But I see now that > the issue trigger failuers in other places, eg. in aggregates: > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > {"cint":528534767,"ctinyint":31,"csmallint":4963,"cfloat":31.0,"cdouble":4963.0,"cstring1":"cvLH6Eat2yFsyy7p"} > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177) > ... 8 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast > to java.lang.Short > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:808) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:790) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) > ... 9 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable > cannot be cast to java.lang.Short > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaShortObjectInspector.get(JavaShortObjectInspector.java:41) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:671) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.compare(ObjectInspectorUtils.java:631) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.merge(GenericUDAFMin.java:109) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMin$GenericUDAFMinEvaluator.iterate(GenericUDAFMin.java:96) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:183) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803) > ... 15 more > {noformat} > My test is (I'm writing a test .q from HIVE-5998, but the repro does not > involve vectorization): > {noformat} > create table if not exists alltypes_parquet ( > cint int, > ctinyint tinyint, > csmallint smallint, > cfloat float, > cdouble double, > cstring1 string) stored as parquet; > insert overwrite table alltypes_parquet > select cint, > ctinyint, > csmallint, > cfloat, > cdouble, > cstring1 > from alltypesorc; > explain select * from alltypes_parquet limit 10; select * from > alltypes_parquet limit 10; > explain select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > select ctinyint, > max(cint), > min(csmallint), > count(cstring1), > avg(cfloat), > stddev_pop(cdouble) > from alltypes_parquet > group by ctinyint; > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)