Duncan,

>One other problem is that the GA currently (IIRC) doesn't process the
>messages, just the tests hit.  Of course, now, the test are different from
>those 2 versions ago, messing up the GA.

Replacing the message by the result of the test would be pretty simple
I beleive.

  X-Spam-Status: No, hits=-2.0 required=5.0 tests=IN_REP_TO version=2.01

The X-Spam_Status kind of gives the results already.

Checking carefully for false positive would be much an issue tho.

Now I may be wrong, but how new tests can be introduced if they are
not accounted by the GA to get some weight?

>Furthermore, everyone has a different idea of what spam is.  Is commercial
>e-mail, that was sent by a company who legitimately has your e-mail address,
>spam?

That is exactely why I would like to have my own corpus.

>I imagine that the size of the corpus is not as important as the variety of
>messages, its currentness, and the accuracy of its filing.

As I don't know how to claim variety, currentness and accuracy, one
good way to do it is to monitor incoming emails until I have
accumulated X different messages. X being big enough so I am sure I
cover all situations.

Olivier

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to