Hi Mike, Thanks for the tip. I did not know about the dictionary. I have had a rule testing the following:
4c-1/2v-3c /[0-9bcdfghjklmnpqrstvwxz]{4,}[aeiouy]{1,2}[0-9bcdfghjklmnpqrstvwxz]{3,}/i This would yield 52 FPs. Varying the combination results in the following: 5c-1/2v-3c -> 2 FP 5c-1v-3c -> 2 FP 5c-1v-4c -> 0 FP 4c-1v-4c -> 1 FP 3c-1v-4c -> 22 FP 3c-1v-5c -> 1 FP 3c-1v-6c -> 0 FP 5c-2v-3c -> 0 FP 5c-2v-4c -> 0 FP 4c-2v-4c -> 0 FP 3c-2v-4c -> 6 FP 3c-2v-5c -> 0 FP 3c-2v-6c -> 0 FP It would be interesting if someone could test a combination or two against a corpus of ham and spam. I have not setup a mechanism yet to extract messages from Exchange. --sigh-- --Larry > -----Original Message----- > From: Mike Kuentz (2) [mailto:[EMAIL PROTECTED] > Sent: Thursday, November 06, 2003 3:28 PM > To: [EMAIL PROTECTED] > Subject: RE: [SAtalk] 'random' character sets > > > angstrom > Armstrong > Bergstrom > birthplace > birthplaces > bremsstrahlung > corkscrew > Dijkstra > downstream > hardscrabble > jockstrap > Knightsbridge > lengthly > lengths > lengthwise > Lindstrom > Longstreet > Nietzsche > nightclub > Nordstrom > offspring > postscript > postscripts > Rothschild > sportswriter > sportswriting > strengths > switchblade > wavelengths > witchcraft > worthwhile > worthwhileness > > That what 5 hits. > > With 6 only > Knightsbridge > > > The dictionary I'm going against is just the standard one that comes > with redhat. (grep -E '[bcdfghjklmnpqrstvwxz]{6}' > /usr/share/dict/linux.words) A more complete dictionary would > result in more hits. I definitely like the idea, though! > > Mike > > > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Chris Santerre > Sent: Thursday, November 06, 2003 2:24 PM > To: 'Greg Webster'; [EMAIL PROTECTED] > Subject: RE: [SAtalk] 'random' character sets > > > There is an excellent set of rules being tested now. Just > more tweaking > needed. Your set is different. I'll give them a go and see how it pans > out! > > --Chris > > > -----Original Message----- > > From: Greg Webster [mailto:[EMAIL PROTECTED] > > Sent: Thursday, November 06, 2003 1:59 PM > > To: [EMAIL PROTECTED] > > Subject: [SAtalk] 'random' character sets > > > > > > A thought on spammers oft-used sets of 'random' character lists in > > emails...an example: > > > > -- > > gnqplleqhzblll > > u > > wfjmvfe upvxoi lwhm > > xqs > > flckwrtsmufx irwajksqsnw er wcfjgfmk jugxfq > > -- > > > > Seems to me that some tests can be made from these... > > body 10_CONSONENTS /[bcdfghjklmnpqrstvwxz]{10}/ > > score GW_10_CONSONENTS 1.0 > > body 9_CONSONENTS /[bcdfghjklmnpqrstvwxz]{9}/ > > score GW_9_CONSONENTS 0.9 > > body 8_CONSONENTS /[bcdfghjklmnpqrstvwxz]{8}/ > > score GW_8_CONSONENTS 0.8 > > body 7_CONSONENTS /[bcdfghjklmnpqrstvwxz]{7}/ > > score GW_7_CONSONENTS 0.7 > > body 6_CONSONENTS /[bcdfghjklmnpqrstvwxz]{6}/ > > score GW_6_CONSONENTS 0.6 > > body 5_CONSONENTS /[bcdfghjklmnpqrstvwxz]{5}/ > > score GW_5_CONSONENTS 0.5 > > > > These have not been tested yet... > > > > Some potential concerns: > > - Encoded messages will likely set this off (uuencode, binhex, etc.) > > - Are there many legitimate situations where 5+ consonents > > will be seen? > > - Will other languages (such as German and Welsh with long > strings of > > consonents) be penalized for using this? > > - Can we determine any other sorts of patterns from spammers use of > > these? > > > > Any more thoughts? > > > > Greg > > > > -- > > Greg Webster - [EMAIL PROTECTED] > > In-Touch Software Corporation > > Ph: (604)278-0515 - Fax: (604)608-3112 > > > > > > > > ------------------------------------------------------- > > This SF.net email is sponsored by: SF.net Giveback Program. > > Does SourceForge.net help you be more productive? Does it > > help you create better code? SHARE THE LOVE, and help us help > > YOU! Click Here: http://sourceforge.net/donate/ > > _______________________________________________ > > Spamassassin-talk mailing list > > [EMAIL PROTECTED] > > https://lists.sourceforge.net/lists/listinfo/spamassassin-talk > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: SF.net Giveback Program. > Does SourceForge.net help you be more productive? Does it > help you create better code? SHARE THE LOVE, and help us help > YOU! Click Here: http://sourceforge.net/donate/ > _______________________________________________ > Spamassassin-talk mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/spamassassin-talk > > > ------------------------------------------------------- > This SF.net email is sponsored by: SF.net Giveback Program. > Does SourceForge.net help you be more productive? Does it > help you create better code? SHARE THE LOVE, and help us help > YOU! Click Here: http://sourceforge.net/donate/ > _______________________________________________ > Spamassassin-talk mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/spamassassin-talk > ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net help you be more productive? Does it help you create better code? SHARE THE LOVE, and help us help YOU! Click Here: http://sourceforge.net/donate/ _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk