[ https://issues.apache.org/jira/browse/HIVE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14598893#comment-14598893 ]
Gopal V commented on HIVE-11043: -------------------------------- [~prasanth_j]: sure, looks like errors when reading footers for the 1 file/1 split case. The error is actually {code} Caused by: java.lang.IndexOutOfBoundsException: Index: 0 at java.util.Collections$EmptyList.get(Collections.java:3212) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getColumnIndicesFromNames(ReaderImpl.java:651) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getRawDataSizeOfColumns(ReaderImpl.java:634) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:938) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:847) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:713) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} > ORC split strategies should adapt based on number of files > ---------------------------------------------------------- > > Key: HIVE-11043 > URL: https://issues.apache.org/jira/browse/HIVE-11043 > Project: Hive > Issue Type: Bug > Affects Versions: 2.0.0 > Reporter: Prasanth Jayachandran > Assignee: Gopal V > Fix For: 2.0.0 > > Attachments: HIVE-11043.1.patch, HIVE-11043.2.patch > > > ORC split strategies added in HIVE-10114 chose strategies based on average > file size. It would be beneficial to choose a different strategy based on > number of files as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)