[SAtalk] Re: Problem with Zero points spam messages.

Bryan Hoover Thu, 01 Jan 2004 00:02:45 -0800

Dragoncrest wrote:
> 
> >Though his friends and family getting scored sounds very possibly like
> >some Bayes corruption going on because of the false negative
> >autolearn(ing) -- not a good thing.
> >Granted though, as the scoring from friends and family was not posted,
> >Bayes may not have had anything to do with it.
> 
>          Hmm, would it be possible to set my filtering rules to default
> levels and let SA learn all over again?  I've got no problems resetting
> things back to zero, redoing configs etc rather than trying to retrain


By, "setting things back to zero," I assume you mean defaults.

Probably not a bad idea.  Setting scores too high, giving false
positives could have caused auto-learning ham as spam.  But I wouldn't
think this would lead to the false negatives, with auto-learning spam as
ham -- that is, I don't think, in Bayes analysis, 'is spam' or 'is not
spam' implies anything about 'is ham' or 'is not ham'  Though I suppose
there could be some problems just because there would be more erroneous
tokens going into Bayes' statistical analysis, and fewer non-erroneous
tokens going into the analysis -- every false negative means one less
true positive going into the analysis.

Again though, in short, if Bayes learned stuff incorrectly then that
needs to be un-done.  If keeping it from happening again means going
back to defaults, then you should do it.  All my scores are the
defaults, and my installation is working quite well, including Bayes
which does quite well.  I've gotten so I only feed Bayes spam that slips
through with less than BAYES_99.

I assume you are auto-learning, and that you don't have a corpus saved
up from which you can simply retrain.  But if you do have such, you'll
need to go through it and make sure what you think is ham, is ham, and
so for spam as well.  Then unlearn whatever false positives/negatives
you find by manually feeding to sa-learn with the correct ham/spam
parameter.  Otherwise -- if you don't have the mail to relearn -- I
suppose you'll have to delete your Bayes database, and get on with
auto-learning again, after reseting your scores.

It's not that scores can't be tweaked, but have to be carefull.  Of
course, tweaked scores being the problem, or part of the problem is only
an assumption.  There's also been a report about an auto-learn bug --
just a couple of mails earlier -- and this could be part of the problem,
with tweaked scores making it worse.  I don't auto-learn, so can't
report on any experience with it, but perhaps there's an argument for
manually feeding Bayes there.  If you do manually train, don't use
someone elses corpus -- Bayes works best with the training population
coming from what it will be seeing under normal operation -- someone
elses' ham, may be what you would call spam, etcetera.

Bryan

> it.  I'd rather start with a clean slate and then debug it from there as it
> tends to be much easier to break a habit when it's new than when it's
> fairly well entrenched.
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: IBM Linux Tutorials.
> Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
> Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
> Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click

-- 
What is a poet?  An unhappy man who hides deep anguish in his heart, but
whose lips are so formed that when the sigh and cry pass through them,
it sounds like lovely music. - (Soren Kierkegaard - Either/Or)

http://www.wecs.com/content.htm

This signature file is generated by Pick-a-Tag !
Written by Jeroen van Vaarsel
http://www.google.com/search?hl=en&ie=ISO-8859-1&q=pick-a-tag



-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

[SAtalk] Re: Problem with Zero points spam messages.

Reply via email to