Re: bayes, numbers of tokens and performance

2010-03-19 Thread Kevin Parris
It doesn't really work that way. Bayes is just one part of the picture and in order to get good results you have to turn the full toolkit loose on the problem; I'm not sure Bayes by itself should be expected to achieve 95% recognition anyway. The main flaw in your current plan is that once you

Re: bayes, numbers of tokens and performance

2010-03-19 Thread Matus UHLAR - fantomas
> On 2010-03-18 16:36, tonjg wrote: >> update: after doing some reading on google I found init.pre and added: >> loadplugin Mail::SpamAssassin::Plugin::Razor2 >> and >> loadplugin Mail::SpamAssassin::Plugin::Pyzor >> and restarted spamassassin. On 18.03.10 16:44, Yet Another Ninja wrote: >

Re: bayes, numbers of tokens and performance

2010-03-18 Thread Bowie Bailey
tonjg wrote: > Yet Another Ninja wrote: > >> Did you also install the plugins? >> These two are not delivered with SA. >> > > I thought they were. In my system I've got: > /var/lib/spamassassin/3.002005/updates_spamassassin_org/25_razor2.cf > /usr/share/spamassassin/25_razor2.cf > and > /va

Re: bayes, numbers of tokens and performance

2010-03-18 Thread tonjg
Yet Another Ninja wrote: > > Did you also install the plugins? > These two are not delivered with SA. > I thought they were. In my system I've got: /var/lib/spamassassin/3.002005/updates_spamassassin_org/25_razor2.cf /usr/share/spamassassin/25_razor2.cf and /var/lib/spamassassin/3.002005/updat

Re: bayes, numbers of tokens and performance

2010-03-18 Thread Mikael Syska
Hi On Thu, Mar 18, 2010 at 1:20 PM, tonjg wrote: > > > Mikael Syska wrote: >> >> Does it help when you sa-learn the spams ? Does it change the BAYES_ >> score for that mail ? > I'm going to do another sa-learn when I hit 100 more spams and I'll see then > if it makes a difference. In the meantime

Re: bayes, numbers of tokens and performance

2010-03-18 Thread Yet Another Ninja
On 2010-03-18 16:36, tonjg wrote: update: after doing some reading on google I found init.pre and added: loadplugin Mail::SpamAssassin::Plugin::Razor2 and loadplugin Mail::SpamAssassin::Plugin::Pyzor and restarted spamassassin. Did you also install the plugins? These two are not delive

Re: bayes, numbers of tokens and performance

2010-03-18 Thread tonjg
update: after doing some reading on google I found init.pre and added: loadplugin Mail::SpamAssassin::Plugin::Razor2 and loadplugin Mail::SpamAssassin::Plugin::Pyzor and restarted spamassassin. -- View this message in context: http://old.nabble.com/bayes%2C-numbers-of-tokens-and-perform

Re: bayes, numbers of tokens and performance

2010-03-18 Thread tonjg
Jason Bertoch-2 wrote: > You should really try to determine why your system isn't > performing well first. ok I've changed it back to 5 -- View this message in context: http://old.nabble.com/bayes%2C-numbers-of-tokens-and-performance-tp27940005p27947096.html Sent from the SpamAssassin - User

Re: bayes, numbers of tokens and performance

2010-03-18 Thread Jason Bertoch
On 2010/03/18 10:56 AM, tonjg wrote: Kai Schaetzl wrote: Don't do that. why not? Rule scores are generated based on a default required_score of 5. Fiddling with the required_score should be the _last_ thing you do, if at all. You should really try to determine why your system isn't per

Re: bayes, numbers of tokens and performance

2010-03-18 Thread tonjg
Kai Schaetzl wrote: > I've been running > with 2 million with no problems. based on this I see you're right that my db's are tiny and not enough for the success rate I'm aiming for. -- View this message in context: http://old.nabble.com/bayes%2C-numbers-of-tokens-and-performance-tp27940005p2

Re: bayes, numbers of tokens and performance

2010-03-18 Thread tonjg
Matus UHLAR - fantomas wrote: > >> DNS available? >> no > > well, why? DNS helps very much for catching spam. all blacklists use DNS > (afaik) sorry, when you said dns I didn't know you were referring to the dnsbl's. I know the black lists are excellent for filtering spam but I've got those sw

Re: bayes, numbers of tokens and performance

2010-03-18 Thread tonjg
Kai Schaetzl wrote: > Don't do that. why not? -- View this message in context: http://old.nabble.com/bayes%2C-numbers-of-tokens-and-performance-tp27940005p27946788.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.

Re: bayes, numbers of tokens and performance

2010-03-18 Thread Matus UHLAR - fantomas
> Mikael Syska wrote: > > Does it help when you sa-learn the spams ? Does it change the BAYES_ > > score for that mail ? On 18.03.10 05:20, tonjg wrote: > I'm going to do another sa-learn when I hit 100 more spams and I'll see then > if it makes a difference. learn whenever possible, mostly on mi

Re: bayes, numbers of tokens and performance

2010-03-18 Thread Kai Schaetzl
Tonjg wrote on Thu, 18 Mar 2010 05:20:45 -0700 (PDT): > I'm going to do another sa-learn when I hit 100 more spams and I'll see then > if it makes a difference. In the meantime I've lowered my hit threshold to > 4. Don't do that. Kai -- Get your web at Conactive Internet Services: http://www.c

Re: bayes, numbers of tokens and performance

2010-03-18 Thread Kai Schaetzl
Tonjg wrote on Thu, 18 Mar 2010 05:17:21 -0700 (PDT): > I hope this command gives the correct answer... > # sa-learn --dump magic > 0.000 0 3 0 non-token data: bayes db version > 0.000 0514 0 non-token data: nspam > 0.000 0402

Re: bayes, numbers of tokens and performance

2010-03-18 Thread Bowie Bailey
tonjg wrote: > Kai Schaetzl wrote: > >> So, how many tokens do you have in your db now? >> > > I hope this command gives the correct answer... > # sa-learn --dump magic > 0.000 0 3 0 non-token data: bayes db version > 0.000 0514 0 non-t

Re: bayes, numbers of tokens and performance

2010-03-18 Thread tonjg
Mikael Syska wrote: > > Does it help when you sa-learn the spams ? Does it change the BAYES_ > score for that mail ? I'm going to do another sa-learn when I hit 100 more spams and I'll see then if it makes a difference. In the meantime I've lowered my hit threshold to 4. DNS available? no --

Re: bayes, numbers of tokens and performance

2010-03-18 Thread tonjg
Kai Schaetzl wrote: > > So, how many tokens do you have in your db now? I hope this command gives the correct answer... # sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0514 0 non-token data: nspam 0.000 0

Re: bayes, numbers of tokens and performance

2010-03-18 Thread tonjg
Matus UHLAR - fantomas wrote: > > do you have network checks enabled? Do you have network plugins (razor, > pyzor, dcc, uribl) loaded? Do you have other plugins (like textcat) > loaded? no, I am unfamilar with these plugins. which version of SA do you have installed? version 3.2.5-1.el4.rf

Re: bayes, numbers of tokens and performance

2010-03-18 Thread Kai Schaetzl
So, how many tokens do you have in your db now? Kai -- Get your web at Conactive Internet Services: http://www.conactive.com

Re: bayes, numbers of tokens and performance

2010-03-18 Thread Matus UHLAR - fantomas
On 17.03.10 16:47, tonjg wrote: > I've done a bayes learn on 500 spams and 400 hams, my spam hit threshold is 5 > and I'm getting a success rate of about 40% in identifying spam, and that's > after doing an sa-update too. I was hoping to get better results than this. > How many spam and ham tokens

Re: bayes, numbers of tokens and performance

2010-03-17 Thread Mikael Syska
Hi On Thu, Mar 18, 2010 at 1:35 AM, tonjg wrote: > > it's not 'a' spam I'm referring to, it lots of different spams getting > through to my inbox with only 40% of them being identified as spam. Yes, that's awful, I would be killed if that many spams got though our filters What rules does th

Re: bayes, numbers of tokens and performance

2010-03-17 Thread tonjg
it's not 'a' spam I'm referring to, it lots of different spams getting through to my inbox with only 40% of them being identified as spam. This strikes me as a poor success rate. What do you mean by 'crappy mail bayes' ? the bayes learn was done on spam and ham that ended up in my inbox. -- View

Re: bayes, numbers of tokens and performance

2010-03-17 Thread Mikael Syska
Hi, What score does the spam have post it with full header on nomorepasting.com or similar ... then we can try what it hits on out setups ... On Thu, Mar 18, 2010 at 12:47 AM, tonjg wrote: > > I've done a bayes learn on 500 spams and 400 hams, my spam hit threshold is 5 > and I'm getting a

bayes, numbers of tokens and performance

2010-03-17 Thread tonjg
I've done a bayes learn on 500 spams and 400 hams, my spam hit threshold is 5 and I'm getting a success rate of about 40% in identifying spam, and that's after doing an sa-update too. I was hoping to get better results than this. How many spam and ham tokens does SA normally need before it really