Hi David, I'm already running masschecks since feb 2017, results labeled 'thendrikx' are mine. :)
I'm not adding massive volumes though, mostly because I'm running a small '3 men and a dog' setup. But I think it's important that I can contribute sample data in my locale (nl_NL), so I would invite others to set it up too: It's not a lot of work and it mostly runs without any manual intervention (I was already manually sorting ham and spam). To give a bit of an idea of how I do it: I run a postfix server on ubuntu, with spamassassin as a milter. I redirect all possible spam into my Junk folder, and check that daily. The masscheck is run using a simple wrapper script that takes the following steps (from daily cron): - Copy all spam in $workdir from Spamtraps and Junk folders (only IMAP-seen emails) and not older than 2 months - Copy all ham into $workdir from several IMAP folders that are known to be sorted by hand, and not older than 6 years - Run masscheck on the copied messages - Print a list of the subjects of the lowest scoring spam samples, and the highest scoring ham samples - Cleanup all copied email - Mail all output to myself I spent less than a day in setting this up, and it has been running without issues ever since. When you're interested, read up on https://wiki.apache.org/spamassassin/NightlyMassCheck and try to set it up. If you run into issues, other masscheckers can probably help you out. Kind regards, Tom On 25-08-18 16:12, David Jones wrote: > Tom, > > Let me know if you are still interested in setting up a masschecker. > That goes for anyone on this list as well. I have worked out the > sorting issue pretty well now and my ena-weekX masscheckers are now the > largest contributions to the RuleQA corpus keeping the nightly rule > scoring updating regularly the past year. > > http://ruleqa.spamassassin.org/ (see the ena-weekX in the green box) > > New/more masscheckers are always welcome and will help you learn the > best way to tune your SA platform to get every last drop of accuracy > from your local meta rules. We could really use masscheckers with > primary languages not English to add/improve core SA rules. > > Here's my setup: > > - I have an iRedmail server that I split copies of most of my email to > an internal-only email domain "sa.ena.net." > > - The iRedmail server has Sieve rules (easily managed by RoundCube) > based on certain rule hits and scores from my main Internet edge > MailScanner filtering that move them into Ham and Spam folders as > unread. Mail scoring in the middle -- not high enough for obvious Spam > or low enough for obvious Ham are left in the main Inbox. > > - I spend a few minutes each day visually scanning the Subjects of the > unread email then mark them as Read. > > - If I find a zero-hour email in the main Inbox, then I move it to a > SpamCop folder. A script that runs every 5 minutes to check the SpamCop > folder, strips of some extra Received headers from my internal hops, > then submits it as an attachment to my SpamCop account. > > - A script moves the Maildir email to 4 other masschecker VMs to split > out the load so they will be able to submit their results quickly. > Ena-week0 is the last week of ham/spam that is still on the iRedMail > server. Ena-week1-4 are running on the other 4 masschecker VMs to give > a total of 5 weeks of recent corpus. I currently have 100,939 Ham and > 292,001 Spam in ena-week0-4. > > - I run a local Bayesian train on the ena-week0 Ham and Spam folder to > my Redis-based Bayes storage shared across my 8 MailScanner nodes and my > iRedMail/amavis server. This method has shown to keep my Bayes scores > very accurate. > > Hope someone finds this information helpful. > > Dave > > > On 01/20/2017 01:02 PM, Tom Hendrikx wrote: >> On 20-01-17 19:46, David Jones wrote: >>>> From: Kevin Golding <k...@caomhin.org> >>>> Sent: Friday, January 20, 2017 11:59 AM >>>> To: users@spamassassin.apache.org >>>> Subject: Re: No rule updates since 1/1/17 >>> >>>> On Fri, 20 Jan 2017 17:26:01 -0000, Bill Keenan >>>> <developerli...@wjkeenan.org> wrote: >>>>> What is the fix needed so /usr/bin/sa-update starts getting updates? I >>>>> too have not received an update from updates.spamassassin.org >>>>> <http://updates.spamassassin.org/> since 1-Jan-17. >>>>> >>>>> Besides updates.spamassassin.org <http://updates.spamassassin.org/>, >>>>> what other rule sets are commonly used? Hundreds of spam messages are >>>>> getting through with only updates.spamassassin.org >>>>> <http://updates.spamassassin.org/> rules. >>>> This seems like a good time to mention >>>> https://wiki.apache.org/spamassassin/NightlyMassCheck >>>> If more people can contribute, even just a small corpora of mail, then >>>> updates will be published more frequently. At the moment a very small >>>> number of people provide data, meaning there is very little margin for >>>> error. >>> I would like to help with the nightly masscheck but I don't have the >>> resources to manually check ham and spam. This also gets into the >>> grey area of how people define spam. I also have a very good MTA >>> setup with RBLs and DNS checks that block most of the spam before >>> it reaches SA in MailScanner. My SA only has to block a very small >>> percentage of my definition of spam so I am not sure how helpful >>> my mail filtering platform can be even though it's very accurate. >>> >>> Dave >>> >> I think I can say the same about my platform, but since this issue keeps >> popping up I just applied for an account just to find out if my >> contribution could help. I can't speculate so I'm just gonna try if it >> helps :) >> >> Kind regards, >> Tom >> > > -- > David Jones >
signature.asc
Description: OpenPGP digital signature