Suggest more finesse, please. I/O and sequences.
Would you like to suggest me any improvements for the following code? I want to make my implementation as simple, as Python - native, as fine as possible. I've written simple code, which reads input text file and creates words' ranking by number of appearence. Code: --- import sys def moreCommonWord( x, y ): if x[1] != y[1]: return cmp( x[1], y[1] ) * -1 return cmp( x[0], y[0] ) wordsDic = {} inFile = open( sys.argv[1] ) for word in inFile.read().split(): if wordsDic.has_key( word ): wordsDic[word] = wordsDic[word] + 1 else: wordsDic[word] = 1 inFile.close() wordsLst = wordsDic.items() wordsLst.sort( moreCommonWord ) outFile = open( sys.argv[2], 'w') for pair in wordsLst: outFile.write( str( pair[1] ).rjust( 7 ) + " : " + str( pair[0] ) + "\n" ) outFile.close() --- In particular, I don't like reading whole file just to split it. It is easy to read by lines - may I read by words with that ease? PS I've been learning Python since todays morning, so be understanding :> -- Greets, Piotrek -- http://mail.python.org/mailman/listinfo/python-list
Re: Suggest more finesse, please. I/O and sequences.
Dnia Fri, 25 Mar 2005 12:51:59 -0800, Scott David Daniels napisał(a): Thanks for your reply! It was really enlightening. > How about: > for line in inFile: > for word in line.split(): > try: > corpus[word] += 1 > except KeyError: > corpus[word] = 1 Above is (probably) not efficient when exception is thrown, that is most of the time (for any new word). However, I've just read about the following: corpus[word] = corpus.setdefault( word, 0 ) + 1 >> wordsLst = wordsDic.items() >> wordsLst.sort( moreCommonWord ) > OK, here I'm going to get version specific. > For Python 2.4 and later: > words = sorted((-freq, word) for word, freq in corpus.iteritems()) This is my favorite! :) You managed to avoid moreCommonWord() through the clever use of list comprehensions and sequences comaparison rules. > After python 2.2: > for negfrequency, word in words: > print >>outFile, '%7d : %s' % (-negfrequency, word) This is also cool, I didn't know about this kind of 'print' usage. > So, with all my prejudices in place and python 2.4 on my box, I'd > lift a few things to functions: While I like your functionality and reusability improvements, I will stick to my as-simple-as-possible solution for given requirements (which I didn't mention, and which assume correct command line arguments for example). Therefore, the current code is: - import sys corpus = {} inFile = open( sys.argv[1] ) for line in inFile: for word in line.split(): corpus[word] = corpus.setdefault( word, 0 ) + 1 inFile.close() words = sorted( ( -freq, word ) for word, freq in corpus.iteritems() ) outFile = open( sys.argv[2], 'w') for negFreq, word in words: print >>outFile, '%7d : %s' % ( -negFreq, word ) outFile.close() - Any ideas how to make it even better? :> -- Regards, Piotrek -- http://mail.python.org/mailman/listinfo/python-list
Re: Suggest more finesse, please. I/O and sequences.
Dnia Fri, 25 Mar 2005 19:17:30 +0100, Qertoip napisał(a): > Would you like to suggest me any improvements for the following code? > I want to make my implementation as simple, as Python - native, as fine as > possible. > I've written simple code, which reads input text file and creates words' > ranking by number of appearence. Good friend of mine heard about my attempt to create compact, simple Python script and authored the following PHP script: -- $data=join(' ', file($argv[1])); foreach(explode(' ', $data) as $slowo) $stat[chop($slowo)]++; array_multisort($stat, SORT_DESC, array_keys($stat)); foreach($stat as $sl=>$il) $odata.="$il : $sl\n"; file_put_contents($argv[2], $odata); -- ...which has the same functionality with less actual code lines [7]. I'm a little bit confused, since I considered Python more expressive then PHP. The more I'm interested in improving my implementation now :) It looks like this [11 actual code lines]: -- import sys corpus = {} inFile = open( sys.argv[1] ) for word in inFile.read().split(): corpus[word] = corpus.get( word, 0 ) + 1 inFile.close() words = sorted( ( -freq, word ) for word, freq in corpus.iteritems() ) outFile = open( sys.argv[2], 'w') for negFreq, word in words: outFile.write( '%7d : %s\n' % ( -negFreq, word ) ) outFile.close() -- PS Thx 2 Scott David Daniels and Lary Bates -- Regards, Piotrek -- http://mail.python.org/mailman/listinfo/python-list
Re: Suggest more finesse, please. I/O and sequences.
Dnia Fri, 25 Mar 2005 21:09:41 -0500, Peter Hansen napisał(a): Thanks for comments! :) > Qertoip wrote: >> Good friend of mine heard about my attempt to create compact, simple Python >> script and authored the following PHP script: > [snip 7-line PHP script] >> ...which has the same functionality with less actual code lines [7]. >> I'm a little bit confused, since I considered Python more expressive then >> PHP. The more I'm interested in improving my implementation now :) >> It looks like this [11 actual code lines]: > [snip 11-line Python] > -- > import sys > > corpus = {} > for word in open(sys.argv[1]).read().split(): > corpus[word] = corpus.get( word, 0 ) + 1 > > words = reversed(sorted(data[::-1] for data in corpus.iteritems())) > > open(sys.argv[2], 'w').writelines('%7d : %s\n' % data for data in words) > -- Is the file automatically closed in both cases? > I'm curious if either this or the PHP does what is really > wanted, however. The above doesn't split on "words", but > merely on whitespace, making the results fairly meaningless > if you are concerned about punctuation etc. You are perfectly right, but the requirements were intentionally simplified, ensuring 0-errors input and allowing words like 'hey,!1_go' I aim to make my implementation as concise as possible but *steel being natural, readable and clear*. -- Pozdr. Piotrek -- http://mail.python.org/mailman/listinfo/python-list
xml.dom.minidom.parseString segmentation fault on mod_python
Python 2.4.4 mod_python 3.2.10 + Apache 2.0 def index( req, **params ): from xml.dom.minidom import parseString doc = parseString( "whatever" ) => blank screen, _no_any_exception_; Apache error_log: [Fri Jan 26 10:18:48 2007] [notice] child pid 17596 exit signal Segmentation fault (11) Outside mod_python code works well. Any ideas? I would be grateful. -- http://mail.python.org/mailman/listinfo/python-list