Hello all, I'd like to teach my bayes correctly especially since I don't get a lot of emails, thanks to Reindl's list I will ignore those headers from now on. But I don't want it to learn that the ******spam****** in the subject means that it is spam or ham, is there a way I can remove it before throwing it at the bayesian filter? Perhaps an extra line in the config or a bash script?
Kind regards, Jeroen 2015-12-03 11:00 GMT+01:00 Reindl Harald <h.rei...@thelounge.net>: > > > Am 03.12.2015 um 10:47 schrieb Sebastian Arcus: > >> On 03/12/15 01:40, Reindl Harald wrote: >> >>> >>> >>> Am 03.12.2015 um 01:14 schrieb Alex: >>> >>>> On Wed, Dec 2, 2015 at 6:34 PM, Dave Warren <da...@hireahit.com> wrote: >>>> >>>>> On 2015-12-02 09:14, Sebastian Arcus wrote: >>>>> >>>>>> >>>>>> Perfect - that's exactly the sort of real-life based advice I was >>>>>> looking >>>>>> for. Many thanks! >>>>>> >>>>> >>>>> I run a small shared hosting environment, with a global bayes for >>>>> all users >>>>> as not enough users are ready/willing/able to take the time to sort ham >>>>> (although more will press "this is spam") and in general, the >>>>> results work >>>>> out well enough. >>>>> >>>> >>>> A portion of the bayes database is the header information from the >>>> email. What does it mean for those headers that contain info specific >>>> to a particular domain or site when it's transferred to another domain >>>> or site where those specifics will be different? >>>> >>> >>> see attached php/formail-script and list of ignored/stripped headers >>> >>> we strip a large portion of headers including especially the Received >>> headers with "formail" and preprend a egenric one on top from all >>> samples before train them >>> >> Does that mean that transferring bayes databases between sites without >> stripping the headers wouldn't work - or it is just more effective if >> one strips the headers? >> > > it worked without strip them around 6 months > but it works better now > > see the 77.72% BAYES_00 which would be more but some trained ham is in > shortcircuit and so don't touch bayes at all > > "SPAMMY" means >= BAYES_60 in the stats > > BAYES_00 3914 77.72 % > BAYES_05 87 1.72 % > BAYES_20 134 2.66 % > BAYES_40 108 2.14 % > BAYES_50 288 5.71 % > BAYES_60 61 1.21 % > BAYES_80 45 0.89 % > BAYES_95 34 0.67 % > BAYES_99 365 7.24 % > BAYES_999 319 6.33 % > > DELIVERED 6609 95.18 % > DNSWL 6249 90.00 % > SPF 4586 66.05 % > SPF/DKIM WL 1880 27.07 % > SHORTCIRCUIT 1900 27.36 % > > BLOCKED 515 7.41 % > SPAMMY 505 7.27 % 98.05 % (OF TOTAL BLOCKED) > > > >