[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is array with parquet files

Sathish (JIRA) Fri, 22 Aug 2014 02:32:30 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sathish updated HIVE-7850:
--------------------------

    Description: 
* Created a parquet file from the Avro file which have 1 array data type and 
rest are primitive types. Avro Schema of the array data type. Eg: 
{code}
{ "name" : "action", "type" : [ { "type" : "array", "items" : "string" }, 
"null" ] }
{code}
* Created External Hive table with the Array type as below, 
{code}
create external table paraArray (action Array) partitioned by (partitionid int) 
row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 
'parquet.hive.MapredParquetInputFormat' outputformat 
'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
alter table paraArray add partition(partitionid=1) location '/testPara';
{code}
* Run the following query(select action from paraArray limit 10) and the Map 
reduce jobs are failing with the following exception.
{code}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row [Error getting row data with exception 
java.lang.ClassCastException: 
parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
org.apache.hadoop.io.ArrayWritable
at 
parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
]
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
... 8 more
{code}


This issue has long back posted on Parquet issues list and Since this is 
related to Parquet Hive serde, I have created the Hive issue here, The details 
and history of this information are as shown in the link here 
https://github.com/Parquet/parquet-mr/issues/281.

  was:
* Created a parquet file from the Avro file which have 1 array data type and 
rest are primitive types. Avro Schema of the array data type. Eg: 
{code}
{ "name" : "action", "type" : [ { "type" : "array", "items" : "string" }, 
"null" ] }
{code}
* Created External Hive table with the Array type as below, 
{code}
create external table paraArray (action Array) partitioned by (partitionid int) 
row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as inputformat 
'parquet.hive.MapredParquetInputFormat' outputformat 
'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
alter table paraArray add partition(partitionid=1) location '/testPara';
{code}
* Run the following query(select action from paraArray limit 10) and the Map 
reduce jobs are failing with the following exception.
{code}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row [Error getting row data with exception 
java.lang.ClassCastException: 
parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
org.apache.hadoop.io.ArrayWritable
at 
parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
at org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
]
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
... 8 more
{code}


> Hive Query failed if the data type is array<string> with parquet files
> ----------------------------------------------------------------------
>
>                 Key: HIVE-7850
>                 URL: https://issues.apache.org/jira/browse/HIVE-7850
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.14.0, 0.13.1
>            Reporter: Sathish
>              Labels: parquet, serde
>
> * Created a parquet file from the Avro file which have 1 array data type and 
> rest are primitive types. Avro Schema of the array data type. Eg: 
> {code}
> { "name" : "action", "type" : [ { "type" : "array", "items" : "string" }, 
> "null" ] }
> {code}
> * Created External Hive table with the Array type as below, 
> {code}
> create external table paraArray (action Array) partitioned by (partitionid 
> int) row format serde 'parquet.hive.serde.ParquetHiveSerDe' stored as 
> inputformat 'parquet.hive.MapredParquetInputFormat' outputformat 
> 'parquet.hive.MapredParquetOutputFormat' location '/testPara'; 
> alter table paraArray add partition(partitionid=1) location '/testPara';
> {code}
> * Run the following query(select action from paraArray limit 10) and the Map 
> reduce jobs are failing with the following exception.
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row [Error getting row data with exception 
> java.lang.ClassCastException: 
> parquet.hive.writable.BinaryWritable$DicBinaryWritable cannot be cast to 
> org.apache.hadoop.io.ArrayWritable
> at 
> parquet.hive.serde.ParquetHiveArrayInspector.getList(ParquetHiveArrayInspector.java:125)
> at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:315)
> at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
> at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
> at org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> ]
> at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
> at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
> ... 8 more
> {code}
> This issue has long back posted on Parquet issues list and Since this is 
> related to Parquet Hive serde, I have created the Hive issue here, The 
> details and history of this information are as shown in the link here 
> https://github.com/Parquet/parquet-mr/issues/281.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-7850) Hive Query failed if the data type is array with parquet files

Reply via email to