Hi David,

I'm already running masschecks since feb 2017, results labeled
'thendrikx' are mine. :)

I'm not adding massive volumes though, mostly because I'm running a
small '3 men and a dog' setup. But I think it's important that I can
contribute sample data in my locale (nl_NL), so I would invite others to
set it up too: It's not a lot of work and it mostly runs without any
manual intervention (I was already manually sorting ham and spam).

To give a bit of an idea of how I do it: I run a postfix server on
ubuntu, with spamassassin as a milter. I redirect all possible spam into
my Junk folder, and check that daily.

The masscheck is run using a simple wrapper script that takes the
following steps (from daily cron):
- Copy all spam in $workdir from Spamtraps and Junk folders (only
IMAP-seen emails) and not older than 2 months
- Copy all ham into $workdir from several IMAP folders that are known to
be sorted by hand, and not older than 6 years
- Run masscheck on the copied messages
- Print a list of the subjects of the lowest scoring spam samples, and
the highest scoring ham samples
- Cleanup all copied email
- Mail all output to myself

I spent less than a day in setting this up, and it has been running
without issues ever since. When you're interested, read up on
https://wiki.apache.org/spamassassin/NightlyMassCheck and try to set it
up. If you run into issues, other masscheckers can probably help you out.

Kind regards,
        Tom

On 25-08-18 16:12, David Jones wrote:
> Tom,
> 
> Let me know if you are still interested in setting up a masschecker. 
> That goes for anyone on this list as well.  I have worked out the
> sorting issue pretty well now and my ena-weekX masscheckers are now the
> largest contributions to the RuleQA corpus keeping the nightly rule
> scoring updating regularly the past year.
> 
> http://ruleqa.spamassassin.org/  (see the ena-weekX in the green box)
> 
> New/more masscheckers are always welcome and will help you learn the
> best way to tune your SA platform to get every last drop of accuracy
> from your local meta rules.  We could really use masscheckers with
> primary languages not English to add/improve core SA rules.
> 
> Here's my setup:
> 
> - I have an iRedmail server that I split copies of most of my email to
> an internal-only email domain "sa.ena.net."
> 
> - The iRedmail server has Sieve rules (easily managed by RoundCube)
> based on certain rule hits and scores from my main Internet edge
> MailScanner filtering that move them into Ham and Spam folders as
> unread.  Mail scoring in the middle -- not high enough for obvious Spam
> or low enough for obvious Ham are left in the main Inbox.
> 
> - I spend a few minutes each day visually scanning the Subjects of the
> unread email then mark them as Read.
> 
> - If I find a zero-hour email in the main Inbox, then I move it to a
> SpamCop folder.  A script that runs every 5 minutes to check the SpamCop
> folder, strips of some extra Received headers from my internal hops,
> then submits it as an attachment to my SpamCop account.
> 
> - A script moves the Maildir email to 4 other masschecker VMs to split
> out the load so they will be able to submit their results quickly. 
> Ena-week0 is the last week of ham/spam that is still on the iRedMail
> server.  Ena-week1-4 are running on the other 4 masschecker VMs to give
> a total of 5 weeks of recent corpus.  I currently have 100,939 Ham and
> 292,001 Spam in ena-week0-4.
> 
> - I run a local Bayesian train on the ena-week0 Ham and Spam folder to
> my Redis-based Bayes storage shared across my 8 MailScanner nodes and my
> iRedMail/amavis server.  This method has shown to keep my Bayes scores
> very accurate.
> 
> Hope someone finds this information helpful.
> 
> Dave
> 
> 
> On 01/20/2017 01:02 PM, Tom Hendrikx wrote:
>> On 20-01-17 19:46, David Jones wrote:
>>>> From: Kevin Golding <k...@caomhin.org>
>>>> Sent: Friday, January 20, 2017 11:59 AM
>>>> To: users@spamassassin.apache.org
>>>> Subject: Re: No rule updates since 1/1/17
>>>     
>>>> On Fri, 20 Jan 2017 17:26:01 -0000, Bill Keenan  
>>>> <developerli...@wjkeenan.org> wrote:
>>>>> What is the fix needed so /usr/bin/sa-update starts getting updates? I  
>>>>> too have not received an update from updates.spamassassin.org  
>>>>> <http://updates.spamassassin.org/> since 1-Jan-17.
>>>>>
>>>>> Besides updates.spamassassin.org <http://updates.spamassassin.org/>, 
>>>>> what other rule sets are commonly used? Hundreds of spam messages are  
>>>>> getting through with only updates.spamassassin.org  
>>>>> <http://updates.spamassassin.org/> rules.
>>>> This seems like a good time to mention  
>>>> https://wiki.apache.org/spamassassin/NightlyMassCheck
>>>> If more people can contribute, even just a small corpora of mail, then  
>>>> updates will be published more frequently. At the moment a very small  
>>>> number of people provide data, meaning there is very little margin for  
>>>> error.
>>> I would like to help with the nightly masscheck but I don't have the
>>> resources to manually check ham and spam.  This also gets into the
>>> grey area of how people define spam.  I also have a very good MTA
>>> setup with RBLs and DNS checks that block most of the spam before
>>> it reaches SA in MailScanner.  My SA only has to block a very small
>>> percentage of my definition of spam so I am not sure how helpful
>>> my mail filtering platform can be even though it's very accurate.
>>>
>>> Dave
>>>
>> I think I can say the same about my platform, but since this issue keeps
>> popping up I just applied for an account just to find out if my
>> contribution could help. I can't speculate so I'm just gonna try if it
>> helps :)
>>
>> Kind regards,
>>      Tom
>>
> 
> -- 
> David Jones
> 


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to