Thanks Tom.  I apologize I didn't know and asked Dave if he would follow-up
on some old requests for masscheck I found in a folder squirreled away!

--
Kevin A. McGrail
VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171

On Sun, Aug 26, 2018 at 7:01 AM, Tom Hendrikx <t...@whyscream.net> wrote:

>
> Hi David,
>
> I'm already running masschecks since feb 2017, results labeled
> 'thendrikx' are mine. :)
>
> I'm not adding massive volumes though, mostly because I'm running a
> small '3 men and a dog' setup. But I think it's important that I can
> contribute sample data in my locale (nl_NL), so I would invite others to
> set it up too: It's not a lot of work and it mostly runs without any
> manual intervention (I was already manually sorting ham and spam).
>
> To give a bit of an idea of how I do it: I run a postfix server on
> ubuntu, with spamassassin as a milter. I redirect all possible spam into
> my Junk folder, and check that daily.
>
> The masscheck is run using a simple wrapper script that takes the
> following steps (from daily cron):
> - Copy all spam in $workdir from Spamtraps and Junk folders (only
> IMAP-seen emails) and not older than 2 months
> - Copy all ham into $workdir from several IMAP folders that are known to
> be sorted by hand, and not older than 6 years
> - Run masscheck on the copied messages
> - Print a list of the subjects of the lowest scoring spam samples, and
> the highest scoring ham samples
> - Cleanup all copied email
> - Mail all output to myself
>
> I spent less than a day in setting this up, and it has been running
> without issues ever since. When you're interested, read up on
> https://wiki.apache.org/spamassassin/NightlyMassCheck and try to set it
> up. If you run into issues, other masscheckers can probably help you out.
>
> Kind regards,
>         Tom
>
> On 25-08-18 16:12, David Jones wrote:
> > Tom,
> >
> > Let me know if you are still interested in setting up a masschecker.
> > That goes for anyone on this list as well.  I have worked out the
> > sorting issue pretty well now and my ena-weekX masscheckers are now the
> > largest contributions to the RuleQA corpus keeping the nightly rule
> > scoring updating regularly the past year.
> >
> > http://ruleqa.spamassassin.org/  (see the ena-weekX in the green box)
> >
> > New/more masscheckers are always welcome and will help you learn the
> > best way to tune your SA platform to get every last drop of accuracy
> > from your local meta rules.  We could really use masscheckers with
> > primary languages not English to add/improve core SA rules.
> >
> > Here's my setup:
> >
> > - I have an iRedmail server that I split copies of most of my email to
> > an internal-only email domain "sa.ena.net."
> >
> > - The iRedmail server has Sieve rules (easily managed by RoundCube)
> > based on certain rule hits and scores from my main Internet edge
> > MailScanner filtering that move them into Ham and Spam folders as
> > unread.  Mail scoring in the middle -- not high enough for obvious Spam
> > or low enough for obvious Ham are left in the main Inbox.
> >
> > - I spend a few minutes each day visually scanning the Subjects of the
> > unread email then mark them as Read.
> >
> > - If I find a zero-hour email in the main Inbox, then I move it to a
> > SpamCop folder.  A script that runs every 5 minutes to check the SpamCop
> > folder, strips of some extra Received headers from my internal hops,
> > then submits it as an attachment to my SpamCop account.
> >
> > - A script moves the Maildir email to 4 other masschecker VMs to split
> > out the load so they will be able to submit their results quickly.
> > Ena-week0 is the last week of ham/spam that is still on the iRedMail
> > server.  Ena-week1-4 are running on the other 4 masschecker VMs to give
> > a total of 5 weeks of recent corpus.  I currently have 100,939 Ham and
> > 292,001 Spam in ena-week0-4.
> >
> > - I run a local Bayesian train on the ena-week0 Ham and Spam folder to
> > my Redis-based Bayes storage shared across my 8 MailScanner nodes and my
> > iRedMail/amavis server.  This method has shown to keep my Bayes scores
> > very accurate.
> >
> > Hope someone finds this information helpful.
> >
> > Dave
> >
> >
> > On 01/20/2017 01:02 PM, Tom Hendrikx wrote:
> >> On 20-01-17 19:46, David Jones wrote:
> >>>> From: Kevin Golding <k...@caomhin.org>
> >>>> Sent: Friday, January 20, 2017 11:59 AM
> >>>> To: users@spamassassin.apache.org
> >>>> Subject: Re: No rule updates since 1/1/17
> >>>
> >>>> On Fri, 20 Jan 2017 17:26:01 -0000, Bill Keenan
> >>>> <developerli...@wjkeenan.org> wrote:
> >>>>> What is the fix needed so /usr/bin/sa-update starts getting updates?
> I
> >>>>> too have not received an update from updates.spamassassin.org
> >>>>> <http://updates.spamassassin.org/> since 1-Jan-17.
> >>>>>
> >>>>> Besides updates.spamassassin.org <http://updates.spamassassin.org/>,
>
> >>>>> what other rule sets are commonly used? Hundreds of spam messages
> are
> >>>>> getting through with only updates.spamassassin.org
> >>>>> <http://updates.spamassassin.org/> rules.
> >>>> This seems like a good time to mention
> >>>> https://wiki.apache.org/spamassassin/NightlyMassCheck
> >>>> If more people can contribute, even just a small corpora of mail,
> then
> >>>> updates will be published more frequently. At the moment a very
> small
> >>>> number of people provide data, meaning there is very little margin
> for
> >>>> error.
> >>> I would like to help with the nightly masscheck but I don't have the
> >>> resources to manually check ham and spam.  This also gets into the
> >>> grey area of how people define spam.  I also have a very good MTA
> >>> setup with RBLs and DNS checks that block most of the spam before
> >>> it reaches SA in MailScanner.  My SA only has to block a very small
> >>> percentage of my definition of spam so I am not sure how helpful
> >>> my mail filtering platform can be even though it's very accurate.
> >>>
> >>> Dave
> >>>
> >> I think I can say the same about my platform, but since this issue keeps
> >> popping up I just applied for an account just to find out if my
> >> contribution could help. I can't speculate so I'm just gonna try if it
> >> helps :)
> >>
> >> Kind regards,
> >>      Tom
> >>
> >
> > --
> > David Jones
> >
>
>
>

Reply via email to