Re: Consecutive Newlines in Rawbody Rules (was: Re: Bayes refinement)

2014-06-17 Thread Dave Pooser
On 5/22/14 6:48 PM, "Karsten Bräckelmann" wrote: >On Thu, 2014-05-22 at 18:34 -0500, David B Funk wrote: >> After doing some experimenting with that code I came up with something >>that >> I'd argue is more semantically correct: >> >> # if we've got a long series of blank lines, limit them

Re: Consecutive Newlines in Rawbody Rules (was: Re: Bayes refinement)

2014-05-22 Thread Karsten Bräckelmann
On Thu, 2014-05-22 at 18:34 -0500, David B Funk wrote: > After doing some experimenting with that code I came up with something that > I'd argue is more semantically correct: > > # if we've got a long series of blank lines, limit them > if (defined $start) { >my $max_blank_lines

Re: Consecutive Newlines in Rawbody Rules (was: Re: Bayes refinement)

2014-05-22 Thread David B Funk
On Thu, 22 May 2014, David B Funk wrote: On Thu, 22 May 2014, Karsten Bräckelmann wrote: On Thu, 2014-05-22 at 03:12 +0200, Karsten Bräckelmann wrote: [snip..] The number of continuation lines equals the number of newlines in the test-case. Well, up until 12, that is. :-/ Any number up to

Re: Consecutive Newlines in Rawbody Rules (was: Re: Bayes refinement)

2014-05-22 Thread David B Funk
On Thu, 22 May 2014, Karsten Bräckelmann wrote: On Thu, 2014-05-22 at 03:12 +0200, Karsten Bräckelmann wrote: [snip..] The number of continuation lines equals the number of newlines in the test-case. Well, up until 12, that is. :-/ Any number up to 11 of consecutive newlines can be matched w

Consecutive Newlines in Rawbody Rules (was: Re: Bayes refinement)

2014-05-22 Thread Karsten Bräckelmann
On Thu, 2014-05-22 at 03:12 +0200, Karsten Bräckelmann wrote: > In either case, having a sample would speed up this ping-pong style > debugging. And I am curious. ;) Mind putting your sample up a pastebin? Ian sent me the original message off-list. It indeed contains about 16 consecutive newlines

Re: Bayes refinement

2014-05-21 Thread Karsten Bräckelmann
On Wed, 2014-05-21 at 17:32 -0700, Ian Zimmerman wrote: > > The test message does not have that string. Maybe it uses DOS > > flavor "\r\n". Or what appears to be a bunch of linebreaks > > actually has spaces mixed in. > > Well, no. I looked at the message (the same data I fed to s.a. --debug) >

Re: Bayes refinement

2014-05-21 Thread Ian Zimmerman
On Wed, 21 May 2014 22:26:41 +0200 Karsten Bräckelmann wrote: Karsten> Seriously, the above rule, the shorter /\n{10}/, as well as the Karsten> variant posted by John without quantifier do exactly what you Karsten> asked for. They match 10 consecutive \n newline chars in the Karsten> rawbody. Ok

Re: Matching multiple newlines [Was: Bayes refinement]

2014-05-21 Thread Karsten Bräckelmann
On Wed, 2014-05-21 at 11:59 -0700, Ian Zimmerman wrote: > Would this be of any import? > > [24+0]~$ perl --version > > This is perl 5, version 14, subversion 2 (v5.14.2) Nope, the Perl version does not make a difference here. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10

Re: Bayes refinement

2014-05-21 Thread Karsten Bräckelmann
On Wed, 2014-05-21 at 10:23 -0700, Ian Zimmerman wrote: > I am trying to do a variant of this for text/plain, as that is the type > I mostly face now. But I cannot get it to work. > rawbody __LOCAL_MUCHO_BLANKS /\n{10,}/m You don't need the "or more" quantifier at the end of your RE. That just u

Re: Matching multiple newlines [Was: Bayes refinement]

2014-05-21 Thread John Hardin
On Wed, 21 May 2014, Ian Zimmerman wrote: On Wed, 21 May 2014 11:50:15 -0700 (PDT) John Hardin wrote: rawbody __LOCAL_MUCHO_BLANKS /\n\n\n\n\n\n\n\n\n\n/m Hmmm, no, your version doesn't work, either. Would this be of any import? [24+0]~$ perl --version This is perl 5, version 14, sub

Matching multiple newlines [Was: Bayes refinement]

2014-05-21 Thread Ian Zimmerman
On Wed, 21 May 2014 11:50:15 -0700 (PDT) John Hardin wrote: >rawbody __LOCAL_MUCHO_BLANKS /\n\n\n\n\n\n\n\n\n\n/m Hmmm, no, your version doesn't work, either. Would this be of any import? [24+0]~$ perl --version This is perl 5, version 14, subversion 2 (v5.14.2) built for i486-linux-gn

Re: Bayes refinement

2014-05-21 Thread John Hardin
On Wed, 21 May 2014, Ian Zimmerman wrote: On Wed, 21 May 2014 19:08:51 +0100 Martin Gregorie wrote: rawbody __LOCAL_MUCHO_BLANKS /\n{10,}/m Martin> Looking for newlines rather than whitespace? Does /\s{10,}/m Martin> work any better? Nope, it doesn't :-( Anyway, looking for newlines was m

Re: Bayes refinement

2014-05-21 Thread Ian Zimmerman
On Wed, 21 May 2014 19:08:51 +0100 Martin Gregorie wrote: >> rawbody __LOCAL_MUCHO_BLANKS /\n{10,}/m Martin> Looking for newlines rather than whitespace? Does /\s{10,}/m Martin> work any better? Nope, it doesn't :-( Anyway, looking for newlines was my intention, sorry for the misleading nomenc

Re: Bayes refinement

2014-05-21 Thread Martin Gregorie
On Wed, 2014-05-21 at 10:23 -0700, Ian Zimmerman wrote: > I am trying to do a variant of this for text/plain, as that is the type > I mostly face now. But I cannot get it to work. > > header __LOCAL_PLAIN_ASCII Content-Type =~ /text\/plain; *charset="us-ascii"/i > > rawbody __LOCAL_MUCHO_BLANKS

Re: Bayes refinement

2014-05-21 Thread Ian Zimmerman
On Thu, 15 May 2014 12:18:25 -0800 Kevin Miller wrote: > I implemented a rule that looks for multiple breaks for just that > reason. Can't remember where I "stole" it from - probably some folks > here helped me with it a few years ago. Can't remember who, but > appreciated the assistance. I am

Re: Bayes refinement

2014-05-17 Thread RW
On Fri, 16 May 2014 21:36:22 -0600 Bob Proulx wrote: > David Jones wrote: > > > James B. Byrne wrote: > > > If you keep Bayes well trained (assuming you have enough ham to > > > do so) Bayes poisoning is a myth. > > > > I'm not sure I agree with the "myth" statement. I just had to > > reset my B

Re: Bayes refinement

2014-05-16 Thread David F. Skoll
On Wed, 14 May 2014 17:08:26 -0400 "James B. Byrne" wrote: > Is there any way to limit Bayes content checking to only the first X > characters of the message body? I ask this because it is clear that > the spam messages getting through contain text meant to poison the > tests but this gibberish

Re: Bayes refinement

2014-05-16 Thread Bob Proulx
David Jones wrote: > > James B. Byrne wrote: > > If you keep Bayes well trained (assuming you have enough ham to do so) > > Bayes poisoning is a myth. > > I'm not sure I agree with the "myth" statement. I just had to reset my Bayes > DB after years of it slowly drifting due to bad user input and

Re: Bayes refinement

2014-05-16 Thread Ian Zimmerman
On Fri, 16 May 2014 16:20:21 -0400 Bowie Bailey wrote: > Keep in mind that BAYES_50 and BAYES_60 still contribute positive > scores by default. Though it is technically a neutral result, it > still adds a point or two to the score. > Rather than messing with Bayes, I would focus on the spams yo

Re: Bayes refinement

2014-05-16 Thread Karsten Bräckelmann
On Fri, 2014-05-16 at 11:24 -0700, Ian Zimmerman wrote: > In the last few (~10) days, I have seen a marked increase in FNs, > usually with Bayes values in the 50s and 60s. That's a neutral bayes classification. Other rules should be able to still identify the spam. > On close inspection, I see th

RE: Bayes refinement

2014-05-16 Thread David Jones
>On 05/14/2014 11:08 PM, James B. Byrne wrote: >> Is there any way to limit Bayes content checking to only the first X >> characters of the message body? I ask this because it is clear that the spam >> messages getting through contain text meant to poison the tests but this >> gibberish always t

Re: Bayes refinement

2014-05-16 Thread Axb
On 05/14/2014 11:08 PM, James B. Byrne wrote: Is there any way to limit Bayes content checking to only the first X characters of the message body? I ask this because it is clear that the spam messages getting through contain text meant to poison the tests but this gibberish always trails the mai

Re: Bayes refinement

2014-05-16 Thread David F. Skoll
On Fri, 16 May 2014 11:24:29 -0700 Ian Zimmerman wrote: > On close inspection, I see that the hash-busting garbage appended is > (faux) technical computing talk instead of the usual cookbooks or > classical literature :-p That is, scrambled Stack Overflow > discussions and the like. And of cour

Re: Bayes refinement

2014-05-16 Thread Bowie Bailey
On 5/16/2014 2:24 PM, Ian Zimmerman wrote: On Fri, 16 May 2014 07:22:56 -0400 "David F. Skoll" wrote: James> Is there any way to limit Bayes content checking to only the James> first X characters of the message body? I ask this because it is James> clear that the spam messages getting through

RE: Bayes refinement

2014-05-16 Thread Kevin Miller
harte-lyne.ca] Sent: Wednesday, May 14, 2014 1:08 PM To: users@spamassassin.apache.org Subject: Bayes refinement Is there any way to limit Bayes content checking to only the first X characters of the message body? I ask this because it is clear that the spam messages getting through contain text

Re: Bayes refinement

2014-05-16 Thread Ian Zimmerman
On Fri, 16 May 2014 07:22:56 -0400 "David F. Skoll" wrote: James> Is there any way to limit Bayes content checking to only the James> first X characters of the message body? I ask this because it is James> clear that the spam messages getting through contain text meant James> to poison the tests

Re: Bayes refinement

2014-05-16 Thread John Hardin
On Wed, 14 May 2014, James B. Byrne wrote: Is there any way to limit Bayes content checking to only the first X characters of the message body? I ask this because it is clear that the spam messages getting through contain text meant to poison the tests but this gibberish always trails the main

Re: Bayes refinement

2014-05-16 Thread Bowie Bailey
On 5/14/2014 5:08 PM, James B. Byrne wrote: Is there any way to limit Bayes content checking to only the first X characters of the message body? I ask this because it is clear that the spam messages getting through contain text meant to poison the tests but this gibberish always trails the main

Bayes refinement

2014-05-15 Thread James B. Byrne
Is there any way to limit Bayes content checking to only the first X characters of the message body? I ask this because it is clear that the spam messages getting through contain text meant to poison the tests but this gibberish always trails the main message and is separated by a large white spac