On Monday, October 13, 2003 at 9:02 PM, Larry wrote:
> Does anyone know of a list of either:
> 1) existing/allowed consonant/vowel pairs or sequences
> 2) non-existing/not-allowed consonant/vowel pairs or sequences

> For the English language preferably.

I don't know about pairs of consonants/vowels but I was looking at
strings of consonants excluding "y" since I seem to be getting so much
nonsense garbage in the spam. Someone had mentioned about a week ago
the possibility of doing a lookup in the dictionary but that would
seem to be prohibitive, but it started me thinking about strings of
consonants.

FWIW, here's what I found. I started off looking for five consonants
in a row, excluding y.

http://www.dict.org/bin/Dict

I used this site which will let you search a variety of sources using
regular expressions. I also used a dictionary file I found on the web
that had over 200,000 unduplicated entries. (I did verify no
duplicates, but I didn't verify that the words were valid.) Of those,
there seemed to be only about 700 unique patterns of 5 consonants in a
row excluding y (example is rstwh in erstwhile). Of course, as you
increase the number of consonants in a row, the number of patterns
goes down. This certainly makes it more feasible to do a lookup, but
this doesn't take into account all the acronyms, etc., that are out
there. One thing I did note is that there seems to be at least two or
three of these strings per email and often the consonant string is
longer than 5.

Anyway, I don't know if this info is useful to you at all. I am very
much a newbie at using SA, learning about tests, Bayes, etc.  It was a
fun exercise, though. :)

-- 
Terry



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to