Hi Terry,

This is useful feedback.  It helps me think that there are so many more
combinations of consonants that the test may not be worth the effort.  I
will continue to look at the problem as I would like to more accurately test
for random strings.

I can see the possiblity that a consonant pair may not exist, but if it was
placed in a series of a four adjacent consonants, they could be valid.  I
started testing 4 adjacent (forgetting about stuff like HTML) consonents but
was loosing the battle.  I increased to 5 and still lost (MSHTML).  There
may be consonant pairs that just don't exist in any consonant series and I
think that maybe a list that Fred compiled.  I still need to look at it more
closely.

I also started to look at how to bound the consonants.  I am currently
testing a (4 or more consonants)-(1-2 vowels)-(3 or more consonants)
pattern.  It just does not seem to exist.  Looking at the random strings I
get, I found that I could also mix digits in with the consonants and it
would pick up those strings without affecting the alpha only sequences.
There are some aspects of randomly generated characters that are random and
other aspects that are not.

Thanks for your time Terry!

--Larry



> -----Original Message-----
> From: Terry
> Sent: Monday, October 13, 2003 6:36 PM
> To: '[EMAIL PROTECTED]'
> Subject: Re: [SAtalk] Consonant and Vowel Pairs or Sequences
> 
> 
> On Monday, October 13, 2003 at 9:02 PM, Larry wrote:
> > Does anyone know of a list of either:
> > 1) existing/allowed consonant/vowel pairs or sequences
> > 2) non-existing/not-allowed consonant/vowel pairs or sequences
> 
> > For the English language preferably.
> 
> I don't know about pairs of consonants/vowels but I was 
> looking at strings of consonants excluding "y" since I seem 
> to be getting so much nonsense garbage in the spam. Someone 
> had mentioned about a week ago the possibility of doing a 
> lookup in the dictionary but that would seem to be 
> prohibitive, but it started me thinking about strings of consonants.
> 
> FWIW, here's what I found. I started off looking for five 
> consonants in a row, excluding y.
> 
http://www.dict.org/bin/Dict

I used this site which will let you search a variety of sources using
regular expressions. I also used a dictionary file I found on the web that
had over 200,000 unduplicated entries. (I did verify no duplicates, but I
didn't verify that the words were valid.) Of those, there seemed to be only
about 700 unique patterns of 5 consonants in a row excluding y (example is
rstwh in erstwhile). Of course, as you increase the number of consonants in
a row, the number of patterns goes down. This certainly makes it more
feasible to do a lookup, but this doesn't take into account all the
acronyms, etc., that are out there. One thing I did note is that there seems
to be at least two or three of these strings per email and often the
consonant string is longer than 5.

Anyway, I don't know if this info is useful to you at all. I am very much a
newbie at using SA, learning about tests, Bayes, etc.  It was a fun
exercise, though. :)

-- 
Terry



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net
hosts over 70,000 Open Source Projects. See the people who have HELPED US
provide better services: Click here: http://sourceforge.net/supporters.php
_______________________________________________
Spamassassin-talk mailing list [EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to