Suggest more finesse, please. I/O and sequences.

2005-03-25 Thread Qertoip
Would you like to suggest me any improvements for the following code?
I want to make my implementation as simple, as Python - native, as fine as
possible.

I've written simple code, which reads input text file and creates words'
ranking by number of appearence.

Code:
---
import sys

def moreCommonWord( x, y ):
if x[1] != y[1]:
return cmp( x[1], y[1] ) * -1
return cmp( x[0], y[0] )

wordsDic = {}
inFile = open( sys.argv[1] )
for word in inFile.read().split():
if wordsDic.has_key( word ):
wordsDic[word] = wordsDic[word] + 1
else:
wordsDic[word] = 1
inFile.close()

wordsLst = wordsDic.items()
wordsLst.sort( moreCommonWord )

outFile = open( sys.argv[2], 'w')
for pair in wordsLst:
outFile.write( str( pair[1] ).rjust( 7 ) + " : " + str( pair[0] ) + 
"\n" )
outFile.close()
---

In particular, I don't like reading whole file just to split it. 
It is easy to read by lines - may I read by words with that ease?

PS I've been learning Python since todays morning, so be understanding :>

-- 
Greets,
Piotrek
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Suggest more finesse, please. I/O and sequences.

2005-03-25 Thread Qertoip
Dnia Fri, 25 Mar 2005 12:51:59 -0800, Scott David Daniels napisał(a):

Thanks for your reply! It was really enlightening.

 
> How about:
>  for line in inFile:
>  for word in line.split():
>  try:
>  corpus[word] += 1
>  except KeyError:
>  corpus[word] = 1

Above is (probably) not efficient when exception is thrown, that is most of
the time (for any new word). However, I've just read about the following:
corpus[word] = corpus.setdefault( word, 0 ) + 1


>> wordsLst = wordsDic.items()
>> wordsLst.sort( moreCommonWord )
> OK, here I'm going to get version specific.
> For Python 2.4 and later:
>  words = sorted((-freq, word) for word, freq in corpus.iteritems())

This is my favorite! :) You managed to avoid moreCommonWord() through the
clever use of list comprehensions and sequences comaparison rules.


> After python 2.2:
>   for negfrequency, word in words:
>   print >>outFile, '%7d : %s' % (-negfrequency, word)

This is also cool, I didn't know about this kind of 'print' usage.


> So, with all my prejudices in place and python 2.4 on my box, I'd
> lift a few things to functions:

While I like your functionality and reusability improvements, I will stick
to my as-simple-as-possible solution for given requirements (which I didn't
mention, and which assume correct command line arguments for example).

Therefore, the current code is:
-
import sys

corpus = {}
inFile = open( sys.argv[1] )
for line in inFile:
for word in line.split():
corpus[word] = corpus.setdefault( word, 0 ) + 1
inFile.close()

words = sorted( ( -freq, word ) for word, freq in corpus.iteritems() )

outFile = open( sys.argv[2], 'w')
for negFreq, word in words:
print >>outFile, '%7d : %s' % ( -negFreq, word )
outFile.close()
-

Any ideas how to make it even better? :>


-- 
Regards,
Piotrek
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Suggest more finesse, please. I/O and sequences.

2005-03-25 Thread Qertoip
Dnia Fri, 25 Mar 2005 19:17:30 +0100, Qertoip napisał(a):

> Would you like to suggest me any improvements for the following code?
> I want to make my implementation as simple, as Python - native, as fine as
> possible.

> I've written simple code, which reads input text file and creates words' 
> ranking by number of appearence.

Good friend of mine heard about my attempt to create compact, simple Python
script and authored the following PHP script:

--
$data=join(' ', file($argv[1]));
foreach(explode(' ', $data) as $slowo)
$stat[chop($slowo)]++;

array_multisort($stat, SORT_DESC, array_keys($stat));

foreach($stat as $sl=>$il) 
$odata.="$il : $sl\n";

file_put_contents($argv[2], $odata);
--

...which has the same functionality with less actual code lines [7]. 
I'm a little bit confused, since I considered Python more expressive then
PHP. The more I'm interested in improving my implementation now :)
It looks like this [11 actual code lines]:

--
import sys

corpus = {}
inFile = open( sys.argv[1] )
for word in inFile.read().split():
corpus[word] = corpus.get( word, 0 ) + 1
inFile.close()

words = sorted( ( -freq, word ) for word, freq in corpus.iteritems() )

outFile = open( sys.argv[2], 'w')
for negFreq, word in words:
outFile.write( '%7d : %s\n' % ( -negFreq, word ) )
outFile.close()
--

PS Thx 2 Scott David Daniels and Lary Bates


-- 
Regards,
Piotrek
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Suggest more finesse, please. I/O and sequences.

2005-03-26 Thread Qertoip
Dnia Fri, 25 Mar 2005 21:09:41 -0500, Peter Hansen napisał(a):

Thanks for comments! :)

> Qertoip wrote:
>> Good friend of mine heard about my attempt to create compact, simple Python 
>> script and authored the following PHP script:
> [snip 7-line PHP script]
>> ...which has the same functionality with less actual code lines [7]. 
>> I'm a little bit confused, since I considered Python more expressive then
>> PHP. The more I'm interested in improving my implementation now :)
>> It looks like this [11 actual code lines]:
> [snip 11-line Python]
 
> --
> import sys
> 
> corpus = {}
> for word in open(sys.argv[1]).read().split():
>  corpus[word] = corpus.get( word, 0 ) + 1
> 
> words = reversed(sorted(data[::-1] for data in corpus.iteritems()))
> 
> open(sys.argv[2], 'w').writelines('%7d : %s\n' % data for data in words)
> --

Is the file automatically closed in both cases?


> I'm curious if either this or the PHP does what is really
> wanted, however.  The above doesn't split on "words", but
> merely on whitespace, making the results fairly meaningless
> if you are concerned about punctuation etc.

You are perfectly right, but the requirements were intentionally
simplified, ensuring 0-errors input and allowing words like 'hey,!1_go'

I aim to make my implementation as concise as possible but 
*steel being natural, readable and clear*.


-- 
Pozdr. 
Piotrek
-- 
http://mail.python.org/mailman/listinfo/python-list


xml.dom.minidom.parseString segmentation fault on mod_python

2007-01-26 Thread qertoip
Python 2.4.4
mod_python 3.2.10  +  Apache 2.0

def index( req, **params ):
from xml.dom.minidom import parseString
doc = parseString( "whatever" )

=> blank screen, _no_any_exception_; Apache error_log:
[Fri Jan 26 10:18:48 2007] [notice] child pid 17596 exit signal
Segmentation fault (11)


Outside mod_python code works well. Any ideas? I would be grateful.

-- 
http://mail.python.org/mailman/listinfo/python-list