On Monday, October 13, 2003 at 9:02 PM, Larry wrote: > Does anyone know of a list of either: > 1) existing/allowed consonant/vowel pairs or sequences > 2) non-existing/not-allowed consonant/vowel pairs or sequences
> For the English language preferably. I don't know about pairs of consonants/vowels but I was looking at strings of consonants excluding "y" since I seem to be getting so much nonsense garbage in the spam. Someone had mentioned about a week ago the possibility of doing a lookup in the dictionary but that would seem to be prohibitive, but it started me thinking about strings of consonants. FWIW, here's what I found. I started off looking for five consonants in a row, excluding y. http://www.dict.org/bin/Dict I used this site which will let you search a variety of sources using regular expressions. I also used a dictionary file I found on the web that had over 200,000 unduplicated entries. (I did verify no duplicates, but I didn't verify that the words were valid.) Of those, there seemed to be only about 700 unique patterns of 5 consonants in a row excluding y (example is rstwh in erstwhile). Of course, as you increase the number of consonants in a row, the number of patterns goes down. This certainly makes it more feasible to do a lookup, but this doesn't take into account all the acronyms, etc., that are out there. One thing I did note is that there seems to be at least two or three of these strings per email and often the consonant string is longer than 5. Anyway, I don't know if this info is useful to you at all. I am very much a newbie at using SA, learning about tests, Bayes, etc. It was a fun exercise, though. :) -- Terry ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk