On Sat, Sep 4, 2010 at 9:55 PM, Edward Capriolo <[email protected]> wrote:
> cd /home/edward/hadoop/hadoop-0.20.2/src/
> [edw...@ec src]$ find . | wc -l
> 2683
>
> [edw...@ec apache-cassandra-0.6.3-src]$ find . | wc -l
> 609
Hadoop is horribly horribly bloated, but your comparison is completely bogus.
hadoop-0.20.2 $ find . | sed -n 's,.*\(\.[^./][^./]*\)$,\1,p' | sort |
uniq -c | sort -rn | head
2067 .html
1313 .java
167 .xml
126 .png
119 .py
81 .gif
62 .pdf
50 .sh
47 .jar
34 .c
cassandra-0.6.3-src $ find . | sed -n 's,.*\(\.[^./][^./]*\)$,\1,p' |
sort | uniq -c | sort -rn | head
354 .java
27 .txt
19 .jar
14 .xml
7 .properties
5 .py
5 .bat
4 .png
3 .sh
2 .json
hadoop-0.20.2 $ find . -type f -iname '*test*' | wc -l
382
cassandra-0.6.3-src $ find . -type f -iname '*test*' | wc -l
63
hadoop-0.20.2 $ find src/core -name '*java' | fgrep -vi test | wc -l
332
cassandra-0.6.3-src $ find src -name '*java' | fgrep -vi test | wc -l
252
So Hadoop core only has 30% more files than Cassandra core, but it
also comes with >6x more tests (10x by line count actually).
According to sloccount, hadoop-0.20.2 core has 39 989 SLOC vs 26 226
for cassandra-0.6.3 core (50% more).
--
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com