Here's the basic idea. I have a dictionary of substrings (the substrings stored as keys). I have a list of strings. I want to find out, for each word in the dictionary, how many times the substring occurs non-overlapping occurrences there are in the list of strings. This is the best I could come up with:
for j in self.dict.keys(): c = 0 for i in tc.strings: c += i.count(j) self.dict[j]['count'] += c I tried Boyer-Moore-Horspool as someone else suggested in the group, but it was even slower (probably because of the time spent setting up the pattern). Even when I split the algorithm in two, doing the set-up before the inner loop and only testing the pattern in the inner loop, it was still slow. So, am I doing this the optimal way or am I overlooking something _really_ simple to do? Statistics on the data: The dictionary has 3000+ entries in it. There are several thousand strings. The strings average 10-26 characters in length. -- http://mail.python.org/mailman/listinfo/python-list