Re: Getting started with Bayesian filtering

2011-10-23 Thread Henrik K
On Sun, Oct 23, 2011 at 06:35:02PM -0400, Marios Titas wrote: > Hi all, > > I was recently given a list of 10,000 posts from an internet forum. > Out of those, 9,000 had been aproved by the site's moderators and the > remaining were rejected. I was wondering if I could use this data set > to play

(Non-) Capturing REs (was: Re: How to write rule for From: line)

2011-10-23 Thread Karsten Bräckelmann
[ Plain regex rules sloppily using capturing rather than non-capturing grouping snipped. ] On Mon, 2011-10-24 at 02:27 +0200, wolfgang wrote: > As far as I know, with alternations you should use "?:" at their > beginning to avoid (superfluous) memory usage: > > header FOO From:name =~ /\b(?:s

Re: Getting started with Bayesian filtering

2011-10-23 Thread darxus
On 10/23, Marios Titas wrote: > my $spamassassin=Mail::SpamAssassin->new({ > require_rules => 1, I have no experience using SA this way. I'd start with trying to get it to work with the default configuration, from the command line, not through this API. > rules_filename

Re: How to write rule for From: line

2011-10-23 Thread wolfgang
On 2011-10-24 01:12, Dave Funk wrote: > Karsten's example is a clear win (efficiency) wise over Jakub's but > it's also more restrictive. Because of the \b bounding on the > outside, Karsten's rule will match "From: enlarge now " > but not "From: enlargement now ". > > That can be achieved by addi

Re: How to write rule for From: line

2011-10-23 Thread Karsten Bräckelmann
On Sun, 2011-10-23 at 18:12 -0500, Dave Funk wrote: > On Sun, 23 Oct 2011, Karsten Bräckelmann wrote: > > > header FROM_ENLARG From: =~ > > ^ > > Drop the colon, the header name is a plain "From". > > > > > /(\bsex\b|\bfree\b|\btrial\b|\benlarge.*|

Re: How to write rule for From: line

2011-10-23 Thread Dave Funk
On Sun, 23 Oct 2011, Karsten Br?ckelmann wrote: On Sun, 2011-10-23 at 11:15 -0700, Jakub Serych wrote: Could anybody help newbie to build rule for "From:" line? My server is flooded with spams like this: header FROM_ENLARG From: =~ ^ Drop th

Getting started with Bayesian filtering

2011-10-23 Thread Marios Titas
Hi all, I was recently given a list of 10,000 posts from an internet forum. Out of those, 9,000 had been aproved by the site's moderators and the remaining were rejected. I was wondering if I could use this data set to play with Bayesian filtering in spamassassin. I tried the following: I converte

Re: all spam emails from mailengine1.com servers

2011-10-23 Thread RGB Camera
On Thu, Oct 20, 2011 at 3:47 PM, R - elists wrote: > > does anyone get legit emails that come from the mailengine1.com email > marketing servers? > > aka streamsend aka ezpublishing ??? Indeed, we consider it all spam too, even though we don't see lots of mail coming from there. A lot of B2B spa

Re: How to write rule for From: line

2011-10-23 Thread Karsten Bräckelmann
On Sun, 2011-10-23 at 11:15 -0700, Jakub Serych wrote: > Could anybody help newbie to build rule for "From:" line? My server is > flooded with spams like this: > header FROM_ENLARG From: =~ ^ Drop the colon, the header name is a plain "From". >

How to write rule for From: line

2011-10-23 Thread Jakub Serych
Could anybody help newbie to build rule for "From:" line? My server is flooded with spams like this: From: Enlargement pils Free Sample To: Subject: The scientific breakthrough is here Date: Sun, 23 Oct 2011 18:02:36 -0100 Message-ID: <002801cc91b3$5acd01a0$106704e0$@com> MIME-Version: 1.0 Con

Re: Regex to detect gibberisk

2011-10-23 Thread John Hardin
On Sun, 23 Oct 2011, Marc Perkel wrote: Anyone have a good way to catch these? Mostly coming from Hotmail. Subject: ALotOfTim eToOrd erPil lGood sFro mHe alt hSh op There is a rule for that in my sandbox. I'd have to look to see whether it's been promoted to active. -- John Hardin KA7OHZ

rbldnsd vs bind and udp vs tcp querys

2011-10-23 Thread Benny Pedersen
does spamassassin make tcp dnsbl testing ?, eg is udp forced ? reason is that most rbldnsd server only support udp, but bind try tcp if it setup global for edns0, or udp fails have anyone a way to solve it ?

Regex to detect gibberisk

2011-10-23 Thread Marc Perkel
Anyone have a good way to catch these? Mostly coming from Hotmail. Subject: ALotOfTim eToOrd erPil lGood sFro mHe alt hSh op Body: onsgMRpa otaaeseB Atlp ift aTrtrss lsskcei . BROhd hoidn lod sew esodcywsn csTOUwive nlt hfrri dda otaigbha e TR' adwfh eoTn htnotdw btbd e. UL eofotg esng tt