Johndee Burks created HIVE-4788: ----------------------------------- Summary: RCFile and bzip2 compression not working Key: HIVE-4788 URL: https://issues.apache.org/jira/browse/HIVE-4788 Project: Hive Issue Type: Bug Components: Compression Affects Versions: 0.10.0 Environment: CDH4.2
Reporter: Johndee Burks Priority: Minor The issue is that Bzip2 compressed rcfile data is encountering an error when being queried even the most simple query "select *". The issue is easily reproducible using the following. Create a table and load the sample data below. DDL: create table source_data (a string, b string) row format delimited fields terminated by ','; Sample data: apple,sauce Test: Do the following and you should receive the error listed below for the rcfile table with bz2 compression. create table rc_nobz2 (a string, b string) stored as rcfile; insert into table rc_nobz2 select * from source_txt; SET io.seqfile.compression.type=BLOCK; SET hive.exec.compress.output=true; SET mapred.compress.map.output=true; SET mapred.output.compress=true; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec; create table rc_bz2 (a string, b string) stored as rcfile; insert into table rc_bz2 select * from source_txt; hive> select * from rc_bz2; Failed with exception java.io.IOException:java.io.IOException: Stream is not BZip2 formatted: expected 'h' as first byte but got '�' hive> select * from rc_nobz2; apple sauce -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira