[ https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514560#comment-14514560 ]
Ryan Blue commented on HIVE-9863: --------------------------------- Good news: the [1.6.0 artifacts|https://search.maven.org/#artifactdetails%7Ccom.twitter%7Cparquet-hadoop-bundle%7C1.6.0%7Cjar] have hit maven central. We should be able to get off of the RC release now! And to give you guys a heads-up, 1.7.0 will be going out soon. That's just a rename for artifacts (com.twitter => org.apache.parquet) and packages (parquet => org.apache.parquet). There are no other changes, so it should make updating to the new version clean, although it is an incompatible change. > Querying parquet tables fails with IllegalStateException [Spark Branch] > ----------------------------------------------------------------------- > > Key: HIVE-9863 > URL: https://issues.apache.org/jira/browse/HIVE-9863 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Xuefu Zhang > > Not necessarily happens only in spark branch, queries such as select count(*) > from table_name fails with error: > {code} > hive> select * from content limit 2; > OK > Failed with exception java.io.IOException:java.lang.IllegalStateException: > All the offsets listed in the split should be found in the file. expected: > [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid] > BINARY [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY > [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type] > BINARY [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage] > INT64 [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP > [meta_timestamp] INT64 [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673}, > ColumnMetaData{GZIP [doc_timestamp] INT64 [RLE, PLAIN_DICTIONARY, > BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32 [RLE, > PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size] > INT32 [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP > [source] BINARY [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP > [delete_flag] BOOLEAN [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP > [meta] BINARY [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP > [content] BINARY [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4, > 129785482, 260224757] in range 0, 134217728 > Time taken: 0.253 seconds > hive> > {code} > I can reproduce the problem with either local or yarn-cluster. It seems > happening to MR also. Thus, I suspect this is an parquet problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)