[ 
https://issues.apache.org/jira/browse/HIVE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897099#comment-13897099
 ] 

Sushanth Sowmyan commented on HIVE-5504:
----------------------------------------

Testing related note : This bug is interesting in that hive as well as pig are 
able to read data irrespective of what compression format was actually used. 
i.e., the bug is that when we write to a compressed orc table while specifying 
a compression of SNAPPY,say, pig using HCat will write out the table using the 
default orc compression, which is ZLIB, irrespective of what the metadata 
indicates. This, however, is not a problem for hive in that the end data is 
still readable via hive and hcatalog/pig, so we don't get a read error. The 
read error occurs when external tools that are expecting the file to be 
snappy-compressed find that it is actually zlib compressed. It can also be a 
performance/size issue if snappy is desired over zlib, but we still retain 
zlib. Thus, testing by virtue of readability/non-readability or by way of 
checking for errors is not possible here.

Instead, to test, end-to-end tests are the way to go here, and I've done the 
following for this:

a) Create table using hive -e, specifying orc.compress=SNAPPY
b) use pig -useHCatalog, and write to the aforesaid table.
c) use hive --service orcfiledump on the file inside the table, it will show 
what compression format it sees. Without this patch, it indicates ZLIB, and 
with it, it indicates SNAPPY.

In addition, no other previous tests fail (there are no regressions)

> OrcOutputFormat honors  compression  properties only from within hive
> ---------------------------------------------------------------------
>
>                 Key: HIVE-5504
>                 URL: https://issues.apache.org/jira/browse/HIVE-5504
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>    Affects Versions: 0.11.0, 0.12.0
>            Reporter: Venkat Ranganathan
>         Attachments: HIVE-5504.patch
>
>
> When we import data into a HCatalog table created with the following storage  
> description
> .. stored as orc tblproperties ("orc.compress"="SNAPPY") 
> the resultant orc file still uses the default zlib compression
> It looks like HCatOutputFormat is ignoring the tblproperties specified.   
> show tblproperties shows that the table indeed has the properties properly 
> saved.
> An insert/select into the table has the resulting orc file honor the tbl 
> property.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to