Counting occurences of words in a list of strings

Travers Naran Tue, 24 May 2005 19:40:06 -0700

Here's the basic idea.  I have a dictionary of substrings (the substrings 
stored as keys).  I have a list of strings.  I want to find out, for each 
word in the dictionary, how many times the substring occurs non-overlapping 
occurrences there are in the list of strings.  This is the best I could 
come up with:


         for j in self.dict.keys():
             c = 0
             for i in tc.strings:
                 c += i.count(j)
             self.dict[j]['count'] += c

I tried Boyer-Moore-Horspool as someone else suggested in the group, but it 
was even slower (probably because of the time spent setting up the 
pattern).  Even when I split the algorithm in two, doing the set-up before 
the inner loop and only testing the pattern in the inner loop, it was still 
slow.

So, am I doing this the optimal way or am I overlooking something _really_ 
simple to do?

Statistics on the data:
The dictionary has 3000+ entries in it.
There are several thousand strings.
The strings average 10-26 characters in length.
-- 
http://mail.python.org/mailman/listinfo/python-list

Counting occurences of words in a list of strings

Reply via email to