Ron Adam wrote: > The \w does make a small difference, but not as much as I expected.
that's probably because your benchmark has a lot of dubious overhead: > word_finder = re.compile('[EMAIL PROTECTED]', re.I) no need to force case-insensitive search here; \w looks for both lower- and uppercase characters. > for match in word_finder.finditer(string.lower()): since you're using a case-insensitive RE, that lower() call is not necessary. > word = match.group(0) and findall() is of course faster than finditer() + m.group(). > t = time.clock() > for line in lines.splitlines(): > countDict = foo(line) > tt = time.clock()-t and if you want performance, why are you creating a new dictionary for each line in the sample? here's a more optimized RE word finder: word_finder_2 = re.compile('[EMAIL PROTECTED]').findall def count_words_2(string, word_finder=word_finder_2): # avoid global lookups countDict = {} for word in word_finder(string): countDict[word] = countDict.get(word,0) + 1 return countDict with your original test on a slow machine, I get count_words: 0.29868684 (best of 3) count_words_2: 0.17244873 (best of 3) if I call the function once, on the entire sample string, I get count_words: 0.23096036 (best of 3) count_words_2: 0.11690620 (best of 3) </F> -- http://mail.python.org/mailman/listinfo/python-list