[ https://issues.apache.org/jira/browse/HIVE-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700724#comment-14700724 ]
Hive QA commented on HIVE-11592: -------------------------------- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12750949/HIVE-11592.1.patch {color:green}SUCCESS:{color} +1 9370 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4991/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/4991/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-4991/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12750949 - PreCommit-HIVE-TRUNK-Build > ORC metadata section can sometimes exceed protobuf message size limit > --------------------------------------------------------------------- > > Key: HIVE-11592 > URL: https://issues.apache.org/jira/browse/HIVE-11592 > Project: Hive > Issue Type: Bug > Affects Versions: 1.3.0, 2.0.0 > Reporter: Prasanth Jayachandran > Assignee: Prasanth Jayachandran > Attachments: HIVE-11592.1.patch > > > If there are too many small stripes and with many columns, the overhead for > storing metadata (column stats) can exceed the default protobuf message size > of 64MB. Reading such files will throw the following exception > {code} > Exception in thread "main" > com.google.protobuf.InvalidProtocolBufferException: Protocol message was too > large. May be malicious. Use CodedInputStream.setSizeLimit() to increase > the size limit. > at > com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) > at > com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755) > at > com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:811) > at > com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:329) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.<init>(OrcProto.java:1331) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics.<init>(OrcProto.java:1281) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1374) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1369) > at > com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.<init>(OrcProto.java:4887) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics.<init>(OrcProto.java:4803) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4990) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:4985) > at > com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.<init>(OrcProto.java:12925) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.<init>(OrcProto.java:12872) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12961) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:12956) > at > com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.<init>(OrcProto.java:13599) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.<init>(OrcProto.java:13546) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13635) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:13630) > at > com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) > at > com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217) > at > com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223) > at > com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.parseFrom(OrcProto.java:13746) > at > org.apache.hadoop.hive.ql.io.orc.ReaderImpl$MetaInfoObjExtractor.<init>(ReaderImpl.java:468) > at > org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:314) > at > org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228) > at org.apache.hadoop.hive.ql.io.orc.FileDump.main(FileDump.java:67) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {code} > The only solution for this is to programmatically increase the > CodeInputStream size limit. We should make this configurable via hive config > so that the orc file is readable. Alternatively, we can keep increasing the > size until it parsing succeeds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)