Hello all,

I'd like to teach my bayes correctly especially since I don't get a lot of
emails, thanks to Reindl's list I will ignore those headers from now on.
But I don't want it to learn that the ******spam****** in the subject means
that it is spam or ham, is there a way I can remove it before throwing it
at the bayesian filter? Perhaps an extra line in the config or a bash
script?

Kind regards,

Jeroen

2015-12-03 11:00 GMT+01:00 Reindl Harald <h.rei...@thelounge.net>:

>
>
> Am 03.12.2015 um 10:47 schrieb Sebastian Arcus:
>
>> On 03/12/15 01:40, Reindl Harald wrote:
>>
>>>
>>>
>>> Am 03.12.2015 um 01:14 schrieb Alex:
>>>
>>>> On Wed, Dec 2, 2015 at 6:34 PM, Dave Warren <da...@hireahit.com> wrote:
>>>>
>>>>> On 2015-12-02 09:14, Sebastian Arcus wrote:
>>>>>
>>>>>>
>>>>>> Perfect - that's exactly the sort of real-life based advice I was
>>>>>> looking
>>>>>> for. Many thanks!
>>>>>>
>>>>>
>>>>> I run a small shared hosting environment, with a global bayes for
>>>>> all users
>>>>> as not enough users are ready/willing/able to take the time to sort ham
>>>>> (although more will press "this is spam") and in general, the
>>>>> results work
>>>>> out well enough.
>>>>>
>>>>
>>>> A portion of the bayes database is the header information from the
>>>> email. What does it mean for those headers that contain info specific
>>>> to a particular domain or site when it's transferred to another domain
>>>> or site where those specifics will be different?
>>>>
>>>
>>> see attached php/formail-script and list of ignored/stripped headers
>>>
>>> we strip a large portion of headers including especially the Received
>>> headers with "formail" and preprend a egenric one on top from all
>>> samples before train them
>>>
>> Does that mean that transferring  bayes databases between sites without
>> stripping the headers wouldn't work - or it is just more effective if
>> one strips the headers?
>>
>
> it worked without strip them around 6 months
> but it works better now
>
> see the 77.72% BAYES_00 which would be more but some trained ham is in
> shortcircuit and so don't touch bayes at all
>
> "SPAMMY" means >= BAYES_60 in the stats
>
> BAYES_00         3914   77.72 %
> BAYES_05           87    1.72 %
> BAYES_20          134    2.66 %
> BAYES_40          108    2.14 %
> BAYES_50          288    5.71 %
> BAYES_60           61    1.21 %
> BAYES_80           45    0.89 %
> BAYES_95           34    0.67 %
> BAYES_99          365    7.24 %
> BAYES_999         319    6.33 %
>
> DELIVERED        6609   95.18 %
> DNSWL            6249   90.00 %
> SPF              4586   66.05 %
> SPF/DKIM WL      1880   27.07 %
> SHORTCIRCUIT     1900   27.36 %
>
> BLOCKED           515    7.41 %
> SPAMMY            505    7.27 %    98.05 % (OF TOTAL BLOCKED)
>
>
>
>

Reply via email to