This can be faster, it avoids doing the same things more times: from string import maketrans, ascii_lowercase, ascii_uppercase
def create_words(afile): stripper = """'[",;<>{}_&?!():[]\.=+-*\t\n\r^%0123456789/""" mapper = maketrans(stripper + ascii_uppercase, " "*len(stripper) + ascii_lowercase) countDict = {} for line in afile: for w in line.translate(mapper).split(): if w: if w in countDict: countDict[w] += 1 else: countDict[w] = 1 word_freq = countDict.items() word_freq.sort() for word, freq in word_freq: print word, freq create_words(file("test.txt")) If you can load the whole file in memory then it can be made a little faster... Bear hugs, bearophile -- http://mail.python.org/mailman/listinfo/python-list