> Now I really want to do this.  I'll see what I'm up to this weekend.  :-)

heh, it all looks good to me.  I think I'm just not quite sure what you're 
up to (that, and understores in field names confuse me for some reason ;).

> What really can you track with this besides scoring and the correlation of 
> current email styles and how the tests react to them?  I was also thinking of 
> maybe adding some data from the headers which would track where the email 
> came from but then again I don't want to recreate the razor or another SA 
> clone.  :-)

well, by using this data to make spamassassin into a much more accurate
detector, you could start collecting data on the spammers who send out
those messages..  could generate blacklists/filters based on information
that they put into their spams (phone numbers, web pages, ip's, etc)..  
Not to mention making it easier for those of us who bother reporting them
to isp's and (for WA residents) attorney generals.

> Offhand, how does Razor get false positives?  I thought that it was MD5-based 
> and the email had to be exact?

it does.  but md5 doesn't generate a unique id...   there's no way that a 
smallish number can be used to identify an infinite number of possible 
email combinations..   so while md5 can be used to check integrity of data 
(since the value will change when even one bit in the checked files 
changes), it becomes inaccurate when you're trying to compare DIFFERENT 
things, since you can have two vastly different source files that end up 
with the same checksum.   although this is a bit off topic, a similar 
system could be desgned that would work around this..  maybe by using the 
sa score and some kind of unique id generated by a part of the message 
headers that wouldn't change for each user...

> Yes, that is why I'm thinking of creating this database -- we can see what 
> tests are consistently bad and modify/eliminate them.  I have a terrible 
> problem with opt-in lists being tagged, as well as financial lists.

yeah, it's not easy...    another thing I've considered is (and I'm shamed 
to admit that microsoft seems to have come up with the idea) to create 
whitelists based on your addressbook's...   so messages from people who 
are already in your addressbook can be flagged as not-spam (or just given 
a hefty negative score)..   The problem with this is that this was a 
feature IN an email client, and it'd be a hassle to write importers for 
the various email clients used by *nix users (that, and evolution has a 
HORRID exporter for their addressbook)..

-Chris


_______________________________________________________________

Hundreds of nodes, one monster rendering program.
Now that’s a super model! Visit http://clustering.foundries.sf.net/
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to