Hi,
Maybe this is a newbie question: How to read a snappy-compressed text file? The OS is Windows 7. Currently, I've done the following steps: 1. Built Hadoop 2.4.0 with snappy option. 'hadoop checknative' command displays the following line: snappy: true D:\hadoop-2.4.0\bin\snappy.dll So, I assume hadoop can do snappy compression. BTW, snapp.dll was copied from snapp64.dll file in snappy-windows-1.1.1.8. 2. Added the following configurations to both core-site.xml and yarn-site.xml. <property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.SnappyCodec</value> </property> 3. Added the following environment variable. SPARK_LIBRARY_PATH=D:\hadoop-2.4.0\bin Since I use IntelliJ, the above line was included to the Environment variables section in Run Configuration. 4. Compressed the input text file with snzip.exe which was included in snappy-windows-1.1.1.8. 4. Wrote the code. sc.textFile(compressed_file_name); // no other argument. .map(.) Now when I run my spark application, the results are as follows: 1. 'snappy' string cannot be found in DEBUG log. The most relevant logs are as follows: 14/06/12 18:57:55 DEBUG NativeCodeLoader: Trying to load the custom-built native-hadoop library... 14/06/12 18:57:55 DEBUG NativeCodeLoader: Loaded the native-hadoop library 2. Application fails. The log is as follows: 14/06/12 18:57:57 WARN: int from string failed for: [(some binary characters)] So apparently sc.textFile() does not recognize the file format and read it as-is, so map function receives a garbage. How can I fix this? Thanks.