[ https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sathish updated HIVE-7850: -------------------------- Description: * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { "name" : "action", "type" : [ { "type" : "array", "items" : "string" }, "null" ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more {code} This issue has long back posted on Parquet issues list and Since this is related to Parquet Hive serde, I have created the Hive issue here, The details and history of this information are as shown in the link here https://github.com/Parquet/parquet-mr/issues/281. was: * Created a parquet file from the Avro file which have 1 array data type and rest are primitive types. Avro Schema of the array data type. Eg: {code} { "name" : "action", "type" : [ { "type" : "array", "items" : "string" }, "null" ] } {code} * Created External Hive table with the Array type as below, {code} create external table paraArray (action Array) partitioned by (partitionid int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; alter table paraArray add partition(partitionid=1) location '/testPara'; {code} * Run the following query(select action from paraArray limit 10) and the Map reduce jobs are failing with the following exception. {code} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row [Error getting row data with exception java.lang.ClassCastException: parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to org.apache.hadoop.io.ArrayWritable at parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) ] at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) ... 8 more {code} > Hive Query failed if the data type is array<string> with parquet files > ---------------------------------------------------------------------- > > Key: HIVE-7850 > URL: https://issues.apache.org/jira/browse/HIVE-7850 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Affects Versions: 0.14.0, 0.13.1 > Reporter: Sathish > Labels: parquet, serde > > * Created a parquet file from the Avro file which have 1 array data type and > rest are primitive types. Avro Schema of the array data type. Eg: > {code} > { "name" : "action", "type" : [ { "type" : "array", "items" : "string" }, > "null" ] } > {code} > * Created External Hive table with the Array type as below, > {code} > create external table paraArray (action Array) partitioned by (partitionid > int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as > inputformat 'parquet.hive.MapredParquetInputFormat' outputformat > 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; > alter table paraArray add partition(partitionid=1) location '/testPara'; > {code} > * Run the following query(select action from paraArray limit 10) and the Map > reduce jobs are failing with the following exception. > {code} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row [Error getting row data with exception > java.lang.ClassCastException: > parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to > org.apache.hadoop.io.ArrayWritable > at > parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125) > at > org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315) > at > org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371) > at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236) > at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222) > at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665) > at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) > at org.apache.hadoop.mapred.Child$4.run(Child.java:270) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) > at org.apache.hadoop.mapred.Child.main(Child.java:264) > ] > at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671) > at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144) > ... 8 more > {code} > This issue has long back posted on Parquet issues list and Since this is > related to Parquet Hive serde, I have created the Hive issue here, The > details and history of this information are as shown in the link here > https://github.com/Parquet/parquet-mr/issues/281. -- This message was sent by Atlassian JIRA (v6.2#6252)