Qertoip wrote:
Dnia Fri, 25 Mar 2005 12:51:59 -0800, Scott David Daniels napisał(a):
> ...
for word in line.split():
try:
corpus[word] += 1
except KeyError:
corpus[word] = 1
Above is (probably) not efficient when exception is thrown, that is most of
the time (for any new word). However, I've just read about the following:
corpus[word] = corpus.setdefault( word, 0 ) + 1
That is better for things like:
corpus.setdefault(word, []).append(...)
You might prefer:
corpus[word] = corpus.get(word, 0) + 1
The trade-off depends on the size of your test material. You need
to time it with your mix of words. I was thinking of cranking
through a huge body of text (so words of frequency 1 are by far
the minority case). If you run through Shakespeare's first folio,
and just do the counting part, the try-except and .get cases are
indistinguishable (2.0 sec for each), and the .setdefault version
drags in at a slow 2.2 sec. Just going through Anna Karenina,
again .83, .83 and .91. So the .setdefault form is 10% slower.
For great test cases, (and for your own personal edification)
visit Project Gutenberg.
Beware when you do timing: whether the file is "warm" or not can
make a huge difference. Read through it once before timing either.
--Scott David Daniels
[EMAIL PROTECTED]
--
http://mail.python.org/mailman/listinfo/python-list