Thanks Tom. I apologize I didn't know and asked Dave if he would follow-up on some old requests for masscheck I found in a folder squirreled away!
-- Kevin A. McGrail VP Fundraising, Apache Software Foundation Chair Emeritus Apache SpamAssassin Project https://www.linkedin.com/in/kmcgrail - 703.798.0171 On Sun, Aug 26, 2018 at 7:01 AM, Tom Hendrikx <t...@whyscream.net> wrote: > > Hi David, > > I'm already running masschecks since feb 2017, results labeled > 'thendrikx' are mine. :) > > I'm not adding massive volumes though, mostly because I'm running a > small '3 men and a dog' setup. But I think it's important that I can > contribute sample data in my locale (nl_NL), so I would invite others to > set it up too: It's not a lot of work and it mostly runs without any > manual intervention (I was already manually sorting ham and spam). > > To give a bit of an idea of how I do it: I run a postfix server on > ubuntu, with spamassassin as a milter. I redirect all possible spam into > my Junk folder, and check that daily. > > The masscheck is run using a simple wrapper script that takes the > following steps (from daily cron): > - Copy all spam in $workdir from Spamtraps and Junk folders (only > IMAP-seen emails) and not older than 2 months > - Copy all ham into $workdir from several IMAP folders that are known to > be sorted by hand, and not older than 6 years > - Run masscheck on the copied messages > - Print a list of the subjects of the lowest scoring spam samples, and > the highest scoring ham samples > - Cleanup all copied email > - Mail all output to myself > > I spent less than a day in setting this up, and it has been running > without issues ever since. When you're interested, read up on > https://wiki.apache.org/spamassassin/NightlyMassCheck and try to set it > up. If you run into issues, other masscheckers can probably help you out. > > Kind regards, > Tom > > On 25-08-18 16:12, David Jones wrote: > > Tom, > > > > Let me know if you are still interested in setting up a masschecker. > > That goes for anyone on this list as well. I have worked out the > > sorting issue pretty well now and my ena-weekX masscheckers are now the > > largest contributions to the RuleQA corpus keeping the nightly rule > > scoring updating regularly the past year. > > > > http://ruleqa.spamassassin.org/ (see the ena-weekX in the green box) > > > > New/more masscheckers are always welcome and will help you learn the > > best way to tune your SA platform to get every last drop of accuracy > > from your local meta rules. We could really use masscheckers with > > primary languages not English to add/improve core SA rules. > > > > Here's my setup: > > > > - I have an iRedmail server that I split copies of most of my email to > > an internal-only email domain "sa.ena.net." > > > > - The iRedmail server has Sieve rules (easily managed by RoundCube) > > based on certain rule hits and scores from my main Internet edge > > MailScanner filtering that move them into Ham and Spam folders as > > unread. Mail scoring in the middle -- not high enough for obvious Spam > > or low enough for obvious Ham are left in the main Inbox. > > > > - I spend a few minutes each day visually scanning the Subjects of the > > unread email then mark them as Read. > > > > - If I find a zero-hour email in the main Inbox, then I move it to a > > SpamCop folder. A script that runs every 5 minutes to check the SpamCop > > folder, strips of some extra Received headers from my internal hops, > > then submits it as an attachment to my SpamCop account. > > > > - A script moves the Maildir email to 4 other masschecker VMs to split > > out the load so they will be able to submit their results quickly. > > Ena-week0 is the last week of ham/spam that is still on the iRedMail > > server. Ena-week1-4 are running on the other 4 masschecker VMs to give > > a total of 5 weeks of recent corpus. I currently have 100,939 Ham and > > 292,001 Spam in ena-week0-4. > > > > - I run a local Bayesian train on the ena-week0 Ham and Spam folder to > > my Redis-based Bayes storage shared across my 8 MailScanner nodes and my > > iRedMail/amavis server. This method has shown to keep my Bayes scores > > very accurate. > > > > Hope someone finds this information helpful. > > > > Dave > > > > > > On 01/20/2017 01:02 PM, Tom Hendrikx wrote: > >> On 20-01-17 19:46, David Jones wrote: > >>>> From: Kevin Golding <k...@caomhin.org> > >>>> Sent: Friday, January 20, 2017 11:59 AM > >>>> To: users@spamassassin.apache.org > >>>> Subject: Re: No rule updates since 1/1/17 > >>> > >>>> On Fri, 20 Jan 2017 17:26:01 -0000, Bill Keenan > >>>> <developerli...@wjkeenan.org> wrote: > >>>>> What is the fix needed so /usr/bin/sa-update starts getting updates? > I > >>>>> too have not received an update from updates.spamassassin.org > >>>>> <http://updates.spamassassin.org/> since 1-Jan-17. > >>>>> > >>>>> Besides updates.spamassassin.org <http://updates.spamassassin.org/>, > > >>>>> what other rule sets are commonly used? Hundreds of spam messages > are > >>>>> getting through with only updates.spamassassin.org > >>>>> <http://updates.spamassassin.org/> rules. > >>>> This seems like a good time to mention > >>>> https://wiki.apache.org/spamassassin/NightlyMassCheck > >>>> If more people can contribute, even just a small corpora of mail, > then > >>>> updates will be published more frequently. At the moment a very > small > >>>> number of people provide data, meaning there is very little margin > for > >>>> error. > >>> I would like to help with the nightly masscheck but I don't have the > >>> resources to manually check ham and spam. This also gets into the > >>> grey area of how people define spam. I also have a very good MTA > >>> setup with RBLs and DNS checks that block most of the spam before > >>> it reaches SA in MailScanner. My SA only has to block a very small > >>> percentage of my definition of spam so I am not sure how helpful > >>> my mail filtering platform can be even though it's very accurate. > >>> > >>> Dave > >>> > >> I think I can say the same about my platform, but since this issue keeps > >> popping up I just applied for an account just to find out if my > >> contribution could help. I can't speculate so I'm just gonna try if it > >> helps :) > >> > >> Kind regards, > >> Tom > >> > > > > -- > > David Jones > > > > >