Hi Terry, This is useful feedback. It helps me think that there are so many more combinations of consonants that the test may not be worth the effort. I will continue to look at the problem as I would like to more accurately test for random strings.
I can see the possiblity that a consonant pair may not exist, but if it was placed in a series of a four adjacent consonants, they could be valid. I started testing 4 adjacent (forgetting about stuff like HTML) consonents but was loosing the battle. I increased to 5 and still lost (MSHTML). There may be consonant pairs that just don't exist in any consonant series and I think that maybe a list that Fred compiled. I still need to look at it more closely. I also started to look at how to bound the consonants. I am currently testing a (4 or more consonants)-(1-2 vowels)-(3 or more consonants) pattern. It just does not seem to exist. Looking at the random strings I get, I found that I could also mix digits in with the consonants and it would pick up those strings without affecting the alpha only sequences. There are some aspects of randomly generated characters that are random and other aspects that are not. Thanks for your time Terry! --Larry > -----Original Message----- > From: Terry > Sent: Monday, October 13, 2003 6:36 PM > To: '[EMAIL PROTECTED]' > Subject: Re: [SAtalk] Consonant and Vowel Pairs or Sequences > > > On Monday, October 13, 2003 at 9:02 PM, Larry wrote: > > Does anyone know of a list of either: > > 1) existing/allowed consonant/vowel pairs or sequences > > 2) non-existing/not-allowed consonant/vowel pairs or sequences > > > For the English language preferably. > > I don't know about pairs of consonants/vowels but I was > looking at strings of consonants excluding "y" since I seem > to be getting so much nonsense garbage in the spam. Someone > had mentioned about a week ago the possibility of doing a > lookup in the dictionary but that would seem to be > prohibitive, but it started me thinking about strings of consonants. > > FWIW, here's what I found. I started off looking for five > consonants in a row, excluding y. > http://www.dict.org/bin/Dict I used this site which will let you search a variety of sources using regular expressions. I also used a dictionary file I found on the web that had over 200,000 unduplicated entries. (I did verify no duplicates, but I didn't verify that the words were valid.) Of those, there seemed to be only about 700 unique patterns of 5 consonants in a row excluding y (example is rstwh in erstwhile). Of course, as you increase the number of consonants in a row, the number of patterns goes down. This certainly makes it more feasible to do a lookup, but this doesn't take into account all the acronyms, etc., that are out there. One thing I did note is that there seems to be at least two or three of these strings per email and often the consonant string is longer than 5. Anyway, I don't know if this info is useful to you at all. I am very much a newbie at using SA, learning about tests, Bayes, etc. It was a fun exercise, though. :) -- Terry ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk