[ https://issues.apache.org/jira/browse/HIVE-5546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yin Huai updated HIVE-5546: --------------------------- Status: Patch Available (was: Open) [~sershe] [~ashutoshc] Can you take a look? > A change in ORCInputFormat made by HIVE-4113 was reverted by HIVE-5391 > ---------------------------------------------------------------------- > > Key: HIVE-5546 > URL: https://issues.apache.org/jira/browse/HIVE-5546 > Project: Hive > Issue Type: Bug > Reporter: Yin Huai > Assignee: Yin Huai > Attachments: HIVE-5546.1.patch > > > {code} > 2013-10-15 10:49:49,386 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: > included column ids = > 2013-10-15 10:49:49,386 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: > included columns names = > 2013-10-15 10:49:49,386 INFO org.apache.hadoop.hive.ql.io.orc.OrcInputFormat: > No ORC pushdown predicate > 2013-10-15 10:49:49,834 INFO > org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file > hdfs://localhost:54310/user/hive/warehouse/web_sales_orc/000000_0 > 2013-10-15 10:49:49,834 INFO org.apache.hadoop.mapred.MapTask: > numReduceTasks: 1 > 2013-10-15 10:49:49,840 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = > 100 > 2013-10-15 10:49:49,968 INFO org.apache.hadoop.mapred.TaskLogsTruncater: > Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 > 2013-10-15 10:49:49,994 INFO org.apache.hadoop.io.nativeio.NativeIO: > Initialized cache for UID to User mapping with a cache timeout of 14400 > seconds. > 2013-10-15 10:49:49,994 INFO org.apache.hadoop.io.nativeio.NativeIO: Got > UserName yhuai for UID 1000 from the native implementation > 2013-10-15 10:49:49,996 FATAL org.apache.hadoop.mapred.Child: Error running > child : java.lang.OutOfMemoryError: Java heap space > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > {code} > If includedColumnIds is an empty list, we do not need to read any column. > But, right now, in OrcInputFormat.findIncludedColumns, we have ... > {code} > if (ColumnProjectionUtils.isReadAllColumns(conf) || > includedStr == null || includedStr.trim().length() == 0) { > return null; > } > {code} > If includedStr is an empty string, the code assumes that we need all columns, > which is not correct. -- This message was sent by Atlassian JIRA (v6.1#6144)