I have ideas on this one, how about ignoring any words between []'s this
would prevent false positives for many group discussions, as for example
this group uses SAtalk and I'm sure this word isn't in your dict.  Also
ignore numbers or numbes with chr's between them?  I've seen lots of dates
and other tracking numbers in ham subjects which could be causing your
mass-check to be skewed.

These two alone might help reduce the number of ham hits you are seeing!

P.S.  Thanks for the effort in doing this, it's much appreciated!


Frederic Tarasevicius
Internet Information Services, Inc.
http://www.i-is.com/
810-794-4400
mailto:[EMAIL PROTECTED]



Dallas L. Engelken wrote:
>> -----Original Message-----
>> From: Chris Santerre [mailto:[EMAIL PROTECTED]
>> Sent: Tuesday, December 30, 2003 3:42 PM
>> To: Dallas L. Engelken; [EMAIL PROTECTED]
>> Cc: [EMAIL PROTECTED]
>> Subject: RE: [SAtalk] Spell Checking the Subject Header (RESULTS)
>>
>>
>> WOW!!! Nice work!!
>>
>
> thank you
>
>> How did it handle things not found in the dictionary? Like
>> LFHDJFHFJ$*? I didn't look at the code close enough :)
>>
>
> it basically takes the subject and splits it based on word
> boundaries...
>
> Subject: This is COOOL
>
> becomes
>
> @words = ('This','is','COOOL');
>
> like i said... its a quick hack just to get some decent information
> out of it, hopefully.
>
> then a foreach is ran on @words and pspell checks each $word against
> the dict.  i only used en_US for the test... but you could easily
> take the language detection out of SA and plug in a variable for what
> language it was.  of course you'd need the appropriate dicts
> (http://ftp.gnu.org/gnu/aspell/dict/) for all the languages that can
> be detected.
>
> so to answer your question, if the subject was
>
> Subject: Random characters LFHDJFHFJ$*? in subject
>
> it would have a $notfound_perc = 20.0000%  (1 out of 5 words
> mispelled/unknown) and match the rule
>
> header SUBJ_SPELLING_20         eval:spell_check_subject('20','30')
> describe SUBJ_SPELLING_20       20-29% mis-spelled words in subject
>
> maybe there is ways to improve this... i dunno.  i just blew a
> half-day on it cuz i had nothing better to do :)
>
> d
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: IBM Linux Tutorials.
> Become an expert in LINUX or just sharpen your skills.  Sign up for
> IBM's Free Linux Tutorials.  Learn everything from the bash shell to
> sys admin. Click now! http://ads.osdn.com/?ad_id78&alloc_id371&op=ick
> _______________________________________________
> Spamassassin-talk mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/spamassassin-talk



-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to