[ https://issues.apache.org/jira/browse/HIVE-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159223#comment-15159223 ]
Ashutosh Chauhan commented on HIVE-13040: ----------------------------------------- This patch addresses two distinct issues: * Don't create empty buckets for Tez. We know for sure Tez can handle missing bucket files while doing BMJ & SMBJ. However, MR does explicitly checks for number of files before attempting BMJ & SMBJ, so if we don't create empty files for MR, we risk running into disabling BMJ & SMBJ later on for MR. * Above means, we do end up creating logically empty files for MR (which is majority of test cases). For such cases, ORC currently writes header & footer. This patch includes a change to not write anything at all (ie create 0-length file) in such cases. While reading there are two ways to handle such 0-length files, either make ORC reader resilient to it or exclude such files altogether while doing split generation. I choose second approach as thats more efficient since we avoid wasteful processing for that. So, there are changes related to that as well. > Handle empty bucket creations more efficiently > ----------------------------------------------- > > Key: HIVE-13040 > URL: https://issues.apache.org/jira/browse/HIVE-13040 > Project: Hive > Issue Type: Improvement > Components: Query Processor > Affects Versions: 1.0.0, 1.2.0, 1.1.0, 2.0.0 > Reporter: Ashutosh Chauhan > Assignee: Ashutosh Chauhan > Attachments: HIVE-13040.2.patch, HIVE-13040.3.patch, > HIVE-13040.4.patch, HIVE-13040.5.patch, HIVE-13040.6.patch, > HIVE-13040.7.patch, HIVE-13040.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)