I have ideas on this one, how about ignoring any words between []'s this would prevent false positives for many group discussions, as for example this group uses SAtalk and I'm sure this word isn't in your dict. Also ignore numbers or numbes with chr's between them? I've seen lots of dates and other tracking numbers in ham subjects which could be causing your mass-check to be skewed.
These two alone might help reduce the number of ham hits you are seeing! P.S. Thanks for the effort in doing this, it's much appreciated! Frederic Tarasevicius Internet Information Services, Inc. http://www.i-is.com/ 810-794-4400 mailto:[EMAIL PROTECTED] Dallas L. Engelken wrote: >> -----Original Message----- >> From: Chris Santerre [mailto:[EMAIL PROTECTED] >> Sent: Tuesday, December 30, 2003 3:42 PM >> To: Dallas L. Engelken; [EMAIL PROTECTED] >> Cc: [EMAIL PROTECTED] >> Subject: RE: [SAtalk] Spell Checking the Subject Header (RESULTS) >> >> >> WOW!!! Nice work!! >> > > thank you > >> How did it handle things not found in the dictionary? Like >> LFHDJFHFJ$*? I didn't look at the code close enough :) >> > > it basically takes the subject and splits it based on word > boundaries... > > Subject: This is COOOL > > becomes > > @words = ('This','is','COOOL'); > > like i said... its a quick hack just to get some decent information > out of it, hopefully. > > then a foreach is ran on @words and pspell checks each $word against > the dict. i only used en_US for the test... but you could easily > take the language detection out of SA and plug in a variable for what > language it was. of course you'd need the appropriate dicts > (http://ftp.gnu.org/gnu/aspell/dict/) for all the languages that can > be detected. > > so to answer your question, if the subject was > > Subject: Random characters LFHDJFHFJ$*? in subject > > it would have a $notfound_perc = 20.0000% (1 out of 5 words > mispelled/unknown) and match the rule > > header SUBJ_SPELLING_20 eval:spell_check_subject('20','30') > describe SUBJ_SPELLING_20 20-29% mis-spelled words in subject > > maybe there is ways to improve this... i dunno. i just blew a > half-day on it cuz i had nothing better to do :) > > d > > > ------------------------------------------------------- > This SF.net email is sponsored by: IBM Linux Tutorials. > Become an expert in LINUX or just sharpen your skills. Sign up for > IBM's Free Linux Tutorials. Learn everything from the bash shell to > sys admin. Click now! http://ads.osdn.com/?ad_id78&alloc_id371&op=ick > _______________________________________________ > Spamassassin-talk mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/spamassassin-talk ------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk