On Wed, 28 May 2008, Don Saklad wrote:

a.
What are a dozen or so of the most frequently used
strings of characters in spam messages?... like rolex, maxgain, ...?

Define "string."  If you mean "word," then here are the 12 most common
words in the TREC 2005 corpus, with the number of times they appear:

enron 94799
message 38187
subject 34751
please 31261
company 31257
original 29529
energy 28476
would 28449
power 23643
about 20734
which 19533
there 16392

The data's a little old, but it's sufficient to make the point of why
SpamAssassin doesn't just do naive word matching (and why you
shouldn't, either).

Chris St. Pierre
Unix Systems Administrator
Nebraska Wesleyan University

Reply via email to