Guo Ruijing created HADOOP-10196:
------------------------------------

             Summary: Bzip2Codec Uncompress cannot work
                 Key: HADOOP-10196
                 URL: https://issues.apache.org/jira/browse/HADOOP-10196
             Project: Hadoop Common
          Issue Type: Bug
          Components: io
    Affects Versions: 2.2.0
            Reporter: Guo Ruijing


Bzip2Codec Uncompress cannot work.

1. Compress Sample file:

[hadoop@localhost ~]$ cat StreamCompressor.java
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.compress.CompressionOutputStream;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.ReflectionUtils;

public class StreamCompressor {

public static void main(String[] args) throws Exception
{ String codecClassname = args[0]; Class<?> codecClass = 
Class.forName(codecClassname); Configuration conf = new Configuration(); 
CompressionCodec codec = (CompressionCodec) 
ReflectionUtils.newInstance(codecClass, conf); CompressionOutputStream out = 
codec.createOutputStream(System.out); IOUtils.copyBytes(System.in, out, 4096, 
false); out.finish(); }

}

2. Uncompress Sample file:

[hadoop@localhost ~]$ cat StreamUncompressor.java
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.compress.CompressionOutputStream;
import org.apache.hadoop.io.compress.CompressionInputStream;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.ReflectionUtils;

public class StreamUncompressor {

public static void main(String[] args) throws Exception
{ String codecClassname = args[0]; Class<?> codecClass = 
Class.forName(codecClassname); Configuration conf = new Configuration(); 
CompressionCodec codec = (CompressionCodec) 
ReflectionUtils.newInstance(codecClass, conf); CompressionInputStream in = 
codec.createInputStream(System.in); IOUtils.copyBytes(in, System.out, 4096, 
false); in.close(); }

}

2. How to compile/run

1) javac -classpath 
/usr/lib/gphd/hadoop/hadoop-common-2.0.5-alpha-gphd-2.1.1.0.jar 
StreamCompressor.java

2) javac -classpath 
/usr/lib/gphd/hadoop/hadoop-common-2.0.5-alpha-gphd-2.1.1.0.jar 
StreamUncompressor.java

3) jar -cvf Stream.jar StreamCompressor.class StreamUncompressor.class

4) rm -rf /tmp/my.txt.bz2 && echo abc > /tmp/my.txt && bzip2 /tmp/my.txt && cat 
/tmp/my.txt.bz2 | hadoop jar ./Stream.jar StreamUncompressor 
org.apache.hadoop.io.compress.BZip2Codec

5) echo "text" | hadoop jar ./Stream.jar StreamCompressor 
org.apache.hadoop.io.compress.BZip2Codec | bzcat

3. Test Result
>From test, hadoop doesn't support native bzip2 and java bzip2.

1) hadoop support bzip2 uncompress.

rm -rf /tmp/my.txt.bz2 && echo abc > /tmp/my.txt && bzip2 /tmp/my.txt && cat 
/tmp/my.txt.bz2 | hadoop jar ./Stream.jar StreamUncompressor 
org.apache.hadoop.io.compress.BZip2Codec
13/12/17 03:58:20 WARN bzip2.Bzip2Factory: Failed to load/initialize 
native-bzip2 library system-native, will use pure-Java version
abc <<< expect

2) bzip2 compress cannot work as following:

a) [hadoop@localhost hadoop]$ echo "text" | hadoop jar ./Stream.jar 
StreamCompressor org.apache.hadoop.io.compress.BZip2Codec
13/12/17 04:00:59 WARN bzip2.Bzip2Factory: Failed to load/initialize 
native-bzip2 library system-native, will use pure-Java version
BZ <<<<< not expect

b) [hadoop@localhost hadoop]$ echo "text" | hadoop jar ./Stream.jar 
StreamCompressor org.apache.hadoop.io.compress.BZip2Codec | bzcat
13/12/17 04:01:31 WARN bzip2.Bzip2Factory: Failed to load/initialize 
native-bzip2 library system-native, will use pure-Java version

bzcat: Compressed file ends unexpectedly;
perhaps it is corrupted? Possible reason follows.
bzcat: Invalid argument
Input file = (stdin), output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to