[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

Ryan Blue (JIRA) Thu, 05 Mar 2015 15:42:56 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349635#comment-14349635
 ]


Ryan Blue commented on HIVE-9863:
---------------------------------

[~spena], there is a work-around but it depends on the RC that you're depending 
on. You can use one of the other constructors in the ParquetInputSplit.

In 1.6.0, Parquet will 
[accept|https://github.com/apache/incubator-parquet-mr/blob/master/parquet-hadoop/src/main/java/parquet/hadoop/ParquetRecordReader.java#L204]
 mapreduce.FileSplit, mapred.FileSplit, and ParquetInputSplit, so there will be 
no need for Hive to depend on ParquetInputSplit at all. At that point we will 
probably deprecate it from the public API.

> Querying parquet tables fails with IllegalStateException [Spark Branch]
> -----------------------------------------------------------------------
>
>                 Key: HIVE-9863
>                 URL: https://issues.apache.org/jira/browse/HIVE-9863
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>
> Not necessarily happens only in spark branch, queries such as select count(*) 
> from table_name fails with error:
> {code}
> hive> select * from content limit 2;
> OK
> Failed with exception java.io.IOException:java.lang.IllegalStateException: 
> All the offsets listed in the split should be found in the file. expected: 
> [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] 
> BINARY  [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY 
>  [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] 
> BINARY  [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] 
> INT64  [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP 
> [meta_timestamp] INT64  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, 
> ColumnMetaData{GZIP [doc_timestamp] INT64  [RLE, PLAIN_DICTIONARY, 
> BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32  [RLE, 
> PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] 
> INT32  [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP 
> [source] BINARY  [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP 
> [delete_flag] BOOLEAN  [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP 
> [meta] BINARY  [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP 
> [content] BINARY  [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, 
> 129785482, 260224757] in range 0, 134217728
> Time taken: 0.253 seconds
> hive> 
> {code}
> I can reproduce the problem with either local or yarn-cluster. It seems 
> happening to MR also. Thus, I suspect this is an parquet problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9863) Querying parquet tables fails with IllegalStateException [Spark Branch]

Reply via email to