On December 17, 2003 11:20 am, stan wrote:
> On Wed, Dec 17, 2003 at 11:00:04AM -0500, Pedro Sam wrote:
> > On December 17, 2003 10:16 am, stan wrote:
> > > NTW, I've got a macro that runs sa-lar, and another that runs
> > > spamassian -r. If I run the 2nd one first, I get a message about 0
> > > messages learned from, if I run the first one. Whereas, If I reverse
> > > the order, I get 1 message learned. So it looks to me that I can't
> > > reproduce your error here.
> >
> > ... sorry, I wasn't clear before ...
> >
> > reporting and learning both works with "spamassassin -r".  BUT!! 
> > remember that SA markup must be stripped before reporting or learning. 
> > Now,
> >
> > 1. "sa-learn" command automatically strip SA markup before learning,
> > WORKS! 2. "spamassassin -r" command claims to strip SA markup before
> > reporting, WORKS! (ie it reports the spam without SA markup)
> > 3.  "spamassassin -r" command claims to strip SA markup before learning,
> > DOES NOT WORK!!  (ie it learns the spam WITH SA markup)
> >
> > Why did I suspect that 3 did not work?  because I found many tokens in
> > the bayes database that could only had come from SA markup.  Tokesn like
> > "BAYES_99" were considered VERY spammy.
> >
> > I 'm begging you, can someone please either confirm this problem so we
> > can report it, or someone tell me that it's my problem only ...
>
> OK, if the problem exists, I should have it. But I'm a newbie here. Tell me
> how to check my tokens, and I'll reprt back.

try this:

sa-learn --dump all | sort -n > SOME_FILE

You should get something like the following:

...
0.978          2          0 1067239234  UD:mygrantnow.org
0.985          3          0 1066771497  N:junkN.jpg
0.958          1          0 1067155182  N:NsN-NkwN-N-jNiN
0.958          1          0 1067040199  H*r:8LN3VP9W.vip.fi
0.985          3          0 1071089788  comp-01_05.gif
0.958          1          0 1067324476  HTo:U*sarajonsson
0.958          1          0 1066969001  H*M:7719
0.958          1          0 1067081011  H*m:h9PBOoog018734
...

The first column should be the "spamminess", second is the # occurrence as 
spam, third is the # occurence as ham, fourth is the time (in unix seconds), 
and fifth is the token itself...

So if you find tokens that could only had came from SA markup (stuff like 
BAYES_99), then it probably meant the mechanism used to invoke bayes learning 
did not strip the SA markup...

Pedro

-- 
Sauron is alive in Argentina!


-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to