Hi

Am Die, 2003-09-09 um 03.41 schrieb Kenneth Porter:
> > - The results of the AI alone are as good as Spamassassin's results.
> > Combined it is therefor better.
> 
> What would make the combined result better?

My experiences with a practical use of spamassassin with fitz show the
following results:
- spamassassin gets the 90% or more spam which are NOT optimized for
spamassassin to get through.
- the rest is caught by fitz. The rest is optimized spam and things the
user doesn't want.

Interesting was the experiment where I got mails from an account for a
role-playing-game weekend. Subscriptions and questions as ham. Spam as
usual PLUS an Roleplaying game newsletter with a lot of announcements.
Normally the newsletter is non-spam. And it looks like ham. Talking
about RPGs and even mentioning the convention and where to subscribe.
But after learning two instances of it, it was classified as spam.
 
> What does Fitz do different from SA?

The big new thing is a special tokenization. Many naive bayes solutions
dissect the spam word by word. Even the header.
My Fitz dissects every field of the header a special way. It doesn't
learn -007 but:
Time-zone = -007

By that it get more information out of an mail. The Date-Header alone
supports us with:
Mon, => When does the user normally get mail ? Job accounts get less HAM
on weekends
08 Sep 2003 => Not really relevant 
18:41:33 =>  A lot of SPAMs are written between Midnight and about 5 o
clock
-0700 => Time zone interesting for firms who only have local partners

And this is only the date header.

I also tried not to use Paul Grahams Naive Bayes but as much of the
AI-book-standard Naive Bayes as possible. I had to alter it for my
special tokenization a bit. But not much.


Thorsten Sick
-- 
Thorsten Sick
[EMAIL PROTECTED]
www.hort-des-wissens.de
Winter is coming
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCS d-- s++:- a-- C++ UL+++ P+++ L+++ E W++ N o K w--- 
O-- M- V- PS+ PE- Y+ PGP++ t 5+++ X+ R+ !tv b++++ DI- D 
G e+ h-- r++ y? 
------END GEEK CODE BLOCK------



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to