On Sat, Sep 4, 2010 at 9:55 PM, Edward Capriolo <[email protected]> wrote:
> cd /home/edward/hadoop/hadoop-0.20.2/src/
> [edw...@ec src]$ find . | wc -l
> 2683
>
> [edw...@ec apache-cassandra-0.6.3-src]$ find . | wc -l
> 609

Hadoop is horribly horribly bloated, but your comparison is completely bogus.

hadoop-0.20.2 $ find . | sed -n 's,.*\(\.[^./][^./]*\)$,\1,p' | sort |
uniq -c | sort -rn | head
2067 .html
1313 .java
 167 .xml
 126 .png
 119 .py
  81 .gif
  62 .pdf
  50 .sh
  47 .jar
  34 .c

cassandra-0.6.3-src $ find . | sed -n 's,.*\(\.[^./][^./]*\)$,\1,p' |
sort | uniq -c | sort -rn | head
 354 .java
  27 .txt
  19 .jar
  14 .xml
   7 .properties
   5 .py
   5 .bat
   4 .png
   3 .sh
   2 .json

hadoop-0.20.2 $ find . -type f -iname '*test*' | wc -l
     382
cassandra-0.6.3-src $ find . -type f -iname '*test*' | wc -l
      63

hadoop-0.20.2 $ find src/core -name '*java' | fgrep -vi test | wc -l
     332
cassandra-0.6.3-src $ find src -name '*java' | fgrep -vi test | wc -l
     252

So Hadoop core only has 30% more files than Cassandra core, but it
also comes with >6x more tests (10x by line count actually).
According to sloccount, hadoop-0.20.2 core has 39 989 SLOC vs 26 226
for cassandra-0.6.3 core (50% more).

-- 
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com

Reply via email to