[ 
https://issues.apache.org/jira/browse/HIVE-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095747#comment-13095747
 ] 

Vaibhav Aggarwal commented on HIVE-2266:
----------------------------------------

This patch attempts to fix a bug in the existing functionality in two ways:

1. In HiveFileFormatUtils.java, wrong jobconf is getting passed which is clear 
from the context.

2. In other cases the compression parameters are not getting set.

The only difference this patch produces from the current behavior is smaller 
file sizes on file system. I am not sure how to write a hive query which can 
verify difference in file sizes. Do you have any ideas which can help me add 
some quick tests for this? The current test executes though the code checking 
that it does not result in any Exception or Error. It does not compare file 
size.


> Really? Which platforms are you talking about? Can you tell me how to 
> reproduce this interesting behavior?

Hadoop loads native compression libraries. I believe that they are platform 
dependent hence I do not assume that they always have same compression ratio. 
Please correct me if I am wrong here.

In any case I think this is a broken existing functionality in Hive which we 
should fix.

> Fix compression parameters
> --------------------------
>
>                 Key: HIVE-2266
>                 URL: https://issues.apache.org/jira/browse/HIVE-2266
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Vaibhav Aggarwal
>            Assignee: Vaibhav Aggarwal
>         Attachments: HIVE-2266-2.patch, HIVE-2266.patch
>
>
> There are a number of places where compression values are not set correctly 
> in FileSinkOperator. This results in uncompressed files.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to