> -----Original Message-----
> From: Chris Santerre [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, December 30, 2003 3:42 PM
> To: Dallas L. Engelken; [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: RE: [SAtalk] Spell Checking the Subject Header (RESULTS)
> 
> 
> WOW!!! Nice work!!
> 

thank you

> How did it handle things not found in the dictionary? Like 
> LFHDJFHFJ$*? I didn't look at the code close enough :)
> 

it basically takes the subject and splits it based on word boundaries...

Subject: This is COOOL

becomes

@words = ('This','is','COOOL');

like i said... its a quick hack just to get some decent information out
of it, hopefully.

then a foreach is ran on @words and pspell checks each $word against the
dict.  i only used en_US for the test... but you could easily take the
language detection out of SA and plug in a variable for what language it
was.  of course you'd need the appropriate dicts
(http://ftp.gnu.org/gnu/aspell/dict/) for all the languages that can be
detected.

so to answer your question, if the subject was 

Subject: Random characters LFHDJFHFJ$*? in subject        

it would have a $notfound_perc = 20.0000%  (1 out of 5 words
mispelled/unknown) and match the rule

header SUBJ_SPELLING_20         eval:spell_check_subject('20','30')
describe SUBJ_SPELLING_20       20-29% mis-spelled words in subject

maybe there is ways to improve this... i dunno.  i just blew a half-day
on it cuz i had nothing better to do :)

d


-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id78&alloc_id371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to