Hi Todd, Thanks for your reply. I got the LZO libraries exactly from the same link on github, and build it successfully. So this is not the cause, I think.
Hi Guys, Any other comments? Thanks. Best Regards, Carp 2010/6/9 Todd Lipcon <t...@cloudera.com> > Hi, > > Where did you get the LZO libraries? The ones on Google Code are broken, > please use the ones on github: > > http://github.com/toddlipcon/hadoop-lzo > > Thanks > -Todd > > > On Wed, Jun 9, 2010 at 2:59 AM, 李钰 <car...@gmail.com> wrote: > > > Hi, > > > > While using LZO compression to try to improve performance of my cluster, > I > > found that compression didn't work. The job I run is > > "org.apache.hadoop.examples.Sort", with the input data generated by > > "org.apache.hadoop.examples.RandomWriter". > > I've made sure that I configured lzo native library/jar files right and > set > > all compression related parameters (such as "mapred.compress.map.output", > > "mapred.output.compression.type", "mapred.output.compression.codec", > > "mapred.output.compress" and "map.output.compression.codec"), and the > > tasktracker did compress the map/job output through infomation got from > job > > logs. But the output file is not compressed at all! > > Then I searched the internet, and found from > > http://wiki.apache.org/hadoop/SequenceFile that in *SequenceFile Common > > Header*, there're two bytes decided whether compression and block > > compression tuned on for the file. I checked the sequece file generated > by > > RandomWriter, and the result is as follows: > > > > [hdpad...@shihc008 rand-10mb]$ od -c part-00000 | head -n 15 > > 0000000 S E Q 006 " o r g . a p a c h e . > > 0000020 h a d o o p . i o . B y t e s W > > 0000040 r i t a b l e " o r g . a p a c > > 0000060 h e . h a d o o p . i o . B y t > > 0000100 e s W r i t a b l e *\0 \0* \0 \0 \0 \0 > > 0000120 244 n ! 177 L 316 030 q g 035 351 L ; 024 216 031 > > 0000140 \0 \0 \t 234 \0 \0 001 305 \0 \0 001 301 207 v 5 255 > > 0000160 220 ] 236 < \b 367 & 9 241 \b v 303 m 314 203 220 > > 0000200 335 \0 241 325 232 035 037 267 303 360 \n 025 u P 003 220 > > 0000220 ^ 235 247 036 S 265 271 035 S 247 O 5 337 + 020 q > > 0000240 277 - 003 212 . 230 221 G 241 5 K K 031 273 036 206 > > 0000260 ( 317 303 367 351 214 364 262 340 S 211 230 \r 362 % 335 > > 0000300 } H w & 234 S F 324 321 274 F 377 [ 344 [ h > > 0000320 204 001 265 ] 037 _ r , 020 370 246 327 231 017 205 252 > > 0000340 273 016 310 w 361 326 032 332 200 Y \a X 342 \r 016 364 > > > > I found the marked two bytes are set to zero, which meant tune off the > > compression. And since the value of these two bytes are '\0', I guess > this > > may be a defect that we ignored to set these two bytes and this > > makes sequece file generated by RandomWriter cannot be compressed. And I > > don't know whether this appears in other place. > > > > Is my opinion right? If not, does anybody know what causes the > compression > > not working? Looking forward to your reply! > > > > Thanks and Best Regards, > > Carp > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >