Hi, While using LZO compression to try to improve performance of my cluster, I found that compression didn't work. The job I run is "org.apache.hadoop.examples.Sort", with the input data generated by "org.apache.hadoop.examples.RandomWriter". I've made sure that I configured lzo native library/jar files right and set all compression related parameters (such as "mapred.compress.map.output", "mapred.output.compression.type", "mapred.output.compression.codec", "mapred.output.compress" and "map.output.compression.codec"), and the tasktracker did compress the map/job output through infomation got from job logs. But the output file is not compressed at all! Then I searched the internet, and found from http://wiki.apache.org/hadoop/SequenceFile that in *SequenceFile Common Header*, there're two bytes decided whether compression and block compression tuned on for the file. I checked the sequece file generated by RandomWriter, and the result is as follows:
[hdpad...@shihc008 rand-10mb]$ od -c part-00000 | head -n 15 0000000 S E Q 006 " o r g . a p a c h e . 0000020 h a d o o p . i o . B y t e s W 0000040 r i t a b l e " o r g . a p a c 0000060 h e . h a d o o p . i o . B y t 0000100 e s W r i t a b l e *\0 \0* \0 \0 \0 \0 0000120 244 n ! 177 L 316 030 q g 035 351 L ; 024 216 031 0000140 \0 \0 \t 234 \0 \0 001 305 \0 \0 001 301 207 v 5 255 0000160 220 ] 236 < \b 367 & 9 241 \b v 303 m 314 203 220 0000200 335 \0 241 325 232 035 037 267 303 360 \n 025 u P 003 220 0000220 ^ 235 247 036 S 265 271 035 S 247 O 5 337 + 020 q 0000240 277 - 003 212 . 230 221 G 241 5 K K 031 273 036 206 0000260 ( 317 303 367 351 214 364 262 340 S 211 230 \r 362 % 335 0000300 } H w & 234 S F 324 321 274 F 377 [ 344 [ h 0000320 204 001 265 ] 037 _ r , 020 370 246 327 231 017 205 252 0000340 273 016 310 w 361 326 032 332 200 Y \a X 342 \r 016 364 I found the marked two bytes are set to zero, which meant tune off the compression. And since the value of these two bytes are '\0', I guess this may be a defect that we ignored to set these two bytes and this makes sequece file generated by RandomWriter cannot be compressed. And I don't know whether this appears in other place. Is my opinion right? If not, does anybody know what causes the compression not working? Looking forward to your reply! Thanks and Best Regards, Carp