On 05/11/2016 04:11, DFS wrote:

It reads in a text file of the Bible, and counts the Top 20 most common
words.

http://www.truth.info/download/bible.htm

------------------------------------------------
import time; start=time.clock()
import sys, string
from collections import Counter

#read file
with open(sys.argv[1],'r') as f:
        chars=f.read().lower()

#remove numbers and punctuation
chars=chars.translate(None,string.digits)
chars=chars.translate(None,string.punctuation)

#count words
counts=Counter(chars.split()).most_common(20)           

#print
i=1
for word,count in counts:
        print str(i)+'.',count,word
        i+=1

print "%.3gs"%(time.clock()-start)
------------------------------------------------


1.17s isn't too bad, but it could be faster.

Is it easy to cythonize?  Can someone show me how?

I installed Cython and made some attempts but got lost.

The trouble there isn't really any Python code here to Cythonise.

All the real work is done inside the Collections module. If that was written in Python, then you'd have to Cythonise that, and there might be quite a lot of it!

But I think 'collections' is a built-in module which means it's already in something like C. So it might already be as fast as it gets (0.7 to 0.8 seconds on my machine), unless perhaps a different algorithm is used.

--
Bartc


--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to