[ 
https://issues.apache.org/jira/browse/HIVE-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159223#comment-15159223
 ] 

Ashutosh Chauhan commented on HIVE-13040:
-----------------------------------------

This patch addresses two distinct issues:
* Don't create empty buckets for Tez.  We know for sure Tez can handle missing 
bucket files while doing BMJ & SMBJ. However, MR does explicitly checks for 
number of files before attempting BMJ & SMBJ, so if we don't create empty files 
for MR, we risk running into disabling BMJ & SMBJ later on for MR.
*  Above means, we do end up creating logically empty files for MR (which is 
majority of test cases). For such cases, ORC currently writes header & footer. 
This patch includes a change to not write anything at all (ie create 0-length 
file) in such cases. While reading there are two ways to handle such 0-length 
files, either make ORC reader resilient to it or exclude such files altogether 
while doing split generation. I choose second approach as thats more efficient 
since we avoid wasteful processing for that. So, there are changes related to 
that as well.

> Handle empty bucket creations more efficiently 
> -----------------------------------------------
>
>                 Key: HIVE-13040
>                 URL: https://issues.apache.org/jira/browse/HIVE-13040
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 1.0.0, 1.2.0, 1.1.0, 2.0.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>         Attachments: HIVE-13040.2.patch, HIVE-13040.3.patch, 
> HIVE-13040.4.patch, HIVE-13040.5.patch, HIVE-13040.6.patch, 
> HIVE-13040.7.patch, HIVE-13040.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to