Hi, Where did you get the LZO libraries? The ones on Google Code are broken, please use the ones on github:
http://github.com/toddlipcon/hadoop-lzo Thanks -Todd On Wed, Jun 9, 2010 at 2:59 AM, 李钰 <car...@gmail.com> wrote: > Hi, > > While using LZO compression to try to improve performance of my cluster, I > found that compression didn't work. The job I run is > "org.apache.hadoop.examples.Sort", with the input data generated by > "org.apache.hadoop.examples.RandomWriter". > I've made sure that I configured lzo native library/jar files right and set > all compression related parameters (such as "mapred.compress.map.output", > "mapred.output.compression.type", "mapred.output.compression.codec", > "mapred.output.compress" and "map.output.compression.codec"), and the > tasktracker did compress the map/job output through infomation got from job > logs. But the output file is not compressed at all! > Then I searched the internet, and found from > http://wiki.apache.org/hadoop/SequenceFile that in *SequenceFile Common > Header*, there're two bytes decided whether compression and block > compression tuned on for the file. I checked the sequece file generated by > RandomWriter, and the result is as follows: > > [hdpad...@shihc008 rand-10mb]$ od -c part-00000 | head -n 15 > 0000000 S E Q 006 " o r g . a p a c h e . > 0000020 h a d o o p . i o . B y t e s W > 0000040 r i t a b l e " o r g . a p a c > 0000060 h e . h a d o o p . i o . B y t > 0000100 e s W r i t a b l e *\0 \0* \0 \0 \0 \0 > 0000120 244 n ! 177 L 316 030 q g 035 351 L ; 024 216 031 > 0000140 \0 \0 \t 234 \0 \0 001 305 \0 \0 001 301 207 v 5 255 > 0000160 220 ] 236 < \b 367 & 9 241 \b v 303 m 314 203 220 > 0000200 335 \0 241 325 232 035 037 267 303 360 \n 025 u P 003 220 > 0000220 ^ 235 247 036 S 265 271 035 S 247 O 5 337 + 020 q > 0000240 277 - 003 212 . 230 221 G 241 5 K K 031 273 036 206 > 0000260 ( 317 303 367 351 214 364 262 340 S 211 230 \r 362 % 335 > 0000300 } H w & 234 S F 324 321 274 F 377 [ 344 [ h > 0000320 204 001 265 ] 037 _ r , 020 370 246 327 231 017 205 252 > 0000340 273 016 310 w 361 326 032 332 200 Y \a X 342 \r 016 364 > > I found the marked two bytes are set to zero, which meant tune off the > compression. And since the value of these two bytes are '\0', I guess this > may be a defect that we ignored to set these two bytes and this > makes sequece file generated by RandomWriter cannot be compressed. And I > don't know whether this appears in other place. > > Is my opinion right? If not, does anybody know what causes the compression > not working? Looking forward to your reply! > > Thanks and Best Regards, > Carp > -- Todd Lipcon Software Engineer, Cloudera