In article <[EMAIL PROTECTED]>,
 "Xah Lee" <[EMAIL PROTECTED]> wrote:

> I had a idea today.
> 
> I wanted to know what are the top most frequently used functions in the
> emacs lisp language. I thought i can write a quick script that go thru
> all the elisp library locations and get a word-frequency report i want.
> 
> I started with a simple program:
> http://xahlee.org/p/titus/count_word_frequency.py
> 
> and applied it to a Shakespeare text. Here's a sample result:
> http://xahlee.org/p/titus/word_frequency.html
> 
> Then, i wrote a more elaborate one that recurse thru directories to
> work on elisp code treasury.
> 
> The code is here:
> http://xahlee.org/x/count_word_frequency.py
> 
> and i got a strange result. The word “the” appeared on the top,
> along with many other English words. I quickly realized that these are
> due to lisp function's doc strings. (not comments)
> 
> At this point, it dawned on me that there's no easy way to work around
> this, Unless, i write this script in elisp which has functions that
> read lisp code and can easily filter out doc strings.

For Lisp, just look for symbols that are immediately preceded by ( or 
#'.  The tokens after ( are not always functions, since this is also 
used for constructing literal lists and for subforms of special 
operators (e.g. the variable names in LET bindings) but I think the ones 
that aren't functions will have low enough frequency that they won't 
impact the results.

Perl would be harder, I think.  For ordinary function calls you can look 
for a word followed by (, but built-in functions allow use without 
parentheses around the parameters.

-- 
Barry Margolin, [EMAIL PROTECTED]
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to