You all are keeping me sane and grounded as I deal with the Powers That Be here trying to set this up. It's good to know that I'm not wrong (I agree with everything everyone has said, and pointed out from the beginning a default database would be awful).
And this: "If he insists on starting with a pre-populated Bayes database, he sure knows why. Other than "I'm the boss, I want."" ....... Is exactly right too. We're implementing it locally with auto-learning enabled this weekend (oh, yeah, boss didn't want auto-learning enabled either..). So here goes!! Thanks for all your help. > -----Original Message----- > From: Karsten Bräckelmann [mailto:guent...@rudersport.de] > Sent: Wednesday, May 08, 2013 8:18 PM > To: users@spamassassin.apache.org > Subject: Re: Default Bayes Database > > On Wed, 2013-05-08 at 14:09 -0400, Andrew Talbot wrote: > > Well, I certainly hope someone offers to help! > > Heh! I am really confident, Alex didn't mean to be rude, neither that he > actually hopes no one will help you. Quite the contrary... > > He DID try to help you by explaining why a "default Bayes database" is a bad > idea in the first place. And that was his way of telling you... > > > If only to say "there is no default database." > > That. :) There is none, and there never has been. > > > > As we've spoken about off-list, my boss is being very particular about > > the deployment of Bayes, and it sounds like one of his caveats is that > > we don't start from a blank database. > > I can see how the idea of basing off of some "known to be classified" > tokens sounds tempting. However, there is no such token. None. Just try to > imagine working in an industry where e.g. Viagra and Cialis are totally legit > phrases to use... > > Feel free to direct your boss here. If he insists on starting with a pre- > populated Bayes database, he sure knows why. Other than "I'm the boss, I > want." > > > Anyway, Andrew, your idea of that whole "blank slate" is inaccurate. If you > import someone else's data, before importing your database has been > empty. > > If you collect some ham and spam for initial training, before training your > database has been empty. > > You even do NOT have to deploy SA prior to that. I don't know the size of > your user base, but it seems it shouldn't be hard to have a few of the users > chip in. Get a few of them to collect hand-classified ham and spam for you. > Train Bayes with that. After that, deploy SA to your mail processing chain. > > There you go! A pre-populated Bayes database, based on YOUR particular > ham and spam tokens, before deploying SA in production. > > > -- > char > *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4 > "; > main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: > (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}