Johndee Burks created HIVE-4788:
-----------------------------------

             Summary: RCFile and bzip2 compression not working
                 Key: HIVE-4788
                 URL: https://issues.apache.org/jira/browse/HIVE-4788
             Project: Hive
          Issue Type: Bug
          Components: Compression
    Affects Versions: 0.10.0
         Environment: CDH4.2


            Reporter: Johndee Burks
            Priority: Minor


The issue is that Bzip2 compressed rcfile data is encountering an error when 
being queried even the most simple query "select *". The issue is easily 
reproducible using the following. 

Create a table and load the sample data below. 

DDL: create table source_data (a string, b string) row format delimited fields 
terminated by ',';

Sample data: 
apple,sauce 

Test: 

Do the following and you should receive the error listed below for the rcfile 
table with bz2 compression. 

create table rc_nobz2 (a string, b string) stored as rcfile; 
insert into table rc_nobz2 select * from source_txt; 

SET io.seqfile.compression.type=BLOCK; 
SET hive.exec.compress.output=true; 
SET mapred.compress.map.output=true; 
SET mapred.output.compress=true; 
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec; 

create table rc_bz2 (a string, b string) stored as rcfile; 
insert into table rc_bz2 select * from source_txt; 

hive> select * from rc_bz2; 
Failed with exception java.io.IOException:java.io.IOException: Stream is not 
BZip2 formatted: expected 'h' as first byte but got '�' 
hive> select * from rc_nobz2; 
apple   sauce

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to