Dnia Fri, 25 Mar 2005 12:51:59 -0800, Scott David Daniels napisał(a):
Thanks for your reply! It was really enlightening. > How about: > for line in inFile: > for word in line.split(): > try: > corpus[word] += 1 > except KeyError: > corpus[word] = 1 Above is (probably) not efficient when exception is thrown, that is most of the time (for any new word). However, I've just read about the following: corpus[word] = corpus.setdefault( word, 0 ) + 1 >> wordsLst = wordsDic.items() >> wordsLst.sort( moreCommonWord ) > OK, here I'm going to get version specific. > For Python 2.4 and later: > words = sorted((-freq, word) for word, freq in corpus.iteritems()) This is my favorite! :) You managed to avoid moreCommonWord() through the clever use of list comprehensions and sequences comaparison rules. > After python 2.2: > for negfrequency, word in words: > print >>outFile, '%7d : %s' % (-negfrequency, word) This is also cool, I didn't know about this kind of 'print' usage. > So, with all my prejudices in place and python 2.4 on my box, I'd > lift a few things to functions: While I like your functionality and reusability improvements, I will stick to my as-simple-as-possible solution for given requirements (which I didn't mention, and which assume correct command line arguments for example). Therefore, the current code is: ------------------------------------------------------------------------- import sys corpus = {} inFile = open( sys.argv[1] ) for line in inFile: for word in line.split(): corpus[word] = corpus.setdefault( word, 0 ) + 1 inFile.close() words = sorted( ( -freq, word ) for word, freq in corpus.iteritems() ) outFile = open( sys.argv[2], 'w') for negFreq, word in words: print >>outFile, '%7d : %s' % ( -negFreq, word ) outFile.close() ------------------------------------------------------------------------- Any ideas how to make it even better? :> -- Regards, Piotrek -- http://mail.python.org/mailman/listinfo/python-list