On 24 Jul 2003, Raul Dias wrote:

> Something I found interesting was using Razor + Pyzor + Dcc.
> Then I create meta rules that matchs:
> Razor + Pyzor
> Razor + Dcc
> Pyzor + Dcc
> Razor + Pyzor + Dcc (this will also, of course, match all three before).

This is definitely an interesting idea! But is it a good one? Perhaps
(indeed probably) but one has to answer the following question
before one is sure. Can anyone here answer it? (I've made the question
a bit wordy but the idea is simple).

This idea above basically says "make it much more likely to
kill the email if it's on more than one spam blacklist site".
We have already established in this thread that it is probably *not*
a good idea in general to say "if razor says it's spam with
probability > 90% then it's spam" because razor can make mistakes,
or perhaps be tricked into making mistakes. Indeed the _point_ of spamassassin
is that it's giving you a whole host of other tests on top of razor.
Similarly we should not say "if pyzor says it's spam then it's spam"
and so on. So we are aware of the possibility that each of Razor,
Pyzor and Dcc are capable of making mistakes. [By "mistake" I mean here
"saying it's spam when it's not", I'm not getting into the issue
of saying it's not spam when it is.]

So it seems to me that the _key_ issue here is: are razor, pyzor, dcc making
mistakes with the _same_ emails? Does anyone know enough about
the decision processes going on in more than one of these systems
to be able to state confidently that the chances that errors will
be made are essentially independent? If they are independent then
maybe the idea above is good. But if they are not then the idea
above might make things worse.

Here's a concrete example. Let's take an email that razor mistakenly
says is spam. *Given that this has happened*, what are the chances that
pyzor mistakenly says it's spam? If the chances are much much higher
than the usual chance that pyzor mistakenly calls a ham email spam,
then probably you do _not_ want to give "razor + pyzor" a high score
at all because it will lead to more false positives. But if the
chances are roughly the same that pyzor makes a mistake, independent
of whether razor makes a mistake, then giving razor+pyzor a high
score is a terrific idea.

I raise this question here because I have no idea of the algorithms
razor, pyzor and dcc use. I'm basically asking "are they the same"?
e.g. are they all using spamassassin with razor,pyzor,dcc turned off? :-)
That would be catastrophic!

Kevin

PS sorry to go on for so long. I'm just making
a simple observation on conditional probabilities, but I think the answer 
is important for deciding whether the suggestion above is "valid".



-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to