Qertoip wrote:
Dnia Fri, 25 Mar 2005 12:51:59 -0800, Scott David Daniels napisał(a):
> ...
        for word in line.split():
            try:
                corpus[word] += 1
            except KeyError:
                corpus[word] = 1

Above is (probably) not efficient when exception is thrown, that is most of the time (for any new word). However, I've just read about the following: corpus[word] = corpus.setdefault( word, 0 ) + 1

That is better for things like: corpus.setdefault(word, []).append(...)

You might prefer:

    corpus[word] = corpus.get(word, 0) + 1

The trade-off depends on the size of your test material.  You need
to time it with your mix of words.  I was thinking of cranking
through a huge body of text (so words of frequency 1 are by far
the minority case).  If you run through Shakespeare's first folio,
and just do the counting part, the try-except and .get cases are
indistinguishable (2.0 sec for each), and the .setdefault version
drags in at a slow 2.2 sec.  Just going through Anna Karenina,
again .83, .83 and .91.  So the .setdefault form is 10% slower.
For great test cases, (and for your own personal edification)
visit Project Gutenberg.

Beware when you do timing: whether the file is "warm" or not can
make a huge difference.  Read through it once before timing either.


--Scott David Daniels [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list

Reply via email to