Xah Lee wrote: > I had a idea today. > > I wanted to know what are the top most frequently used functions in the > emacs lisp language. I thought i can write a quick script that go thru > all the elisp library locations and get a word-frequency report i want. > > I started with a simple program: > http://xahlee.org/p/titus/count_word_frequency.py > > and applied it to a Shakespeare text. Here's a sample result: > http://xahlee.org/p/titus/word_frequency.html > > Then, i wrote a more elaborate one that recurse thru directories to > work on elisp code treasury. > > The code is here: > http://xahlee.org/x/count_word_frequency.py > > and i got a strange result. The word “the” appeared on the top, > along with many other English words. I quickly realized that these are > due to lisp function's doc strings. (not comments)
Would be interesting to see if the type-checking "The" in lisp is still frequent. I doubt. > At this point, it dawned on me that there's no easy way to work around > this, Unless, i write this script in elisp which has functions that > read lisp code and can easily filter out doc strings. > > Originally, i planned to use the word-frequency script on Perl, Python, > as well as Java, as well as Elisp. However, now it seems to me this > task is nigh impossible. Each of these lang has their own doc string > syntax. It's gonna be a heavy undertaking if the word-frequency script > is to work with all these langs, since that amounts to writing a parser > for each lang. > > Alternatively, one can write multiple word-frequency scripts using each > lang in question, since most lang has facilities to deal with its own > syntax. However, this is still not trivial, and amounts to several > programing efforts. Editor code (best maybe scintilla/sc1, check also emacs itself, ...) has libraries for colorizing comments in all kinds of programming langs ... > Anyone would be interested in this problem? I have a theory, that "bad source code" has more if/else/elif/case/switch dispatching statements per number of code words (lines..) than "good code" - independent of the language. If you can count these ratio and correlate it to maybe a sf-ranking and to languages, that would be highly interesting for me... (in case drop a pointer in this thread / repeated subject) -robert -- http://mail.python.org/mailman/listinfo/python-list