Re: [SAtalk] slooooow rules

2002-02-22 Thread Matt Sergeant
On 21 Feb 2002, Craig Hughes wrote: > > could someone please explain what does [^<] matches ? > > afaik ^ means beginning-of-line but it's strange in [] character array. > > so, what does ^ mean there? begin-of-line or '^' character? > > i think it's beg-of-line, as PCRE couldn't optimize this re

Re: [SAtalk] slooooow rules

2002-02-22 Thread Nigel Metheringham
On Thu, 2002-02-21 at 20:53, Craig Hughes wrote: > On Thu, 2002-02-21 at 10:22, Arpi wrote: [Original regexp] > > > FOR_INSTANT_ACCESS: > > > /(?:CLICK HERE|).{0,20}\s+INSTANT\s+ACCESS.{0,20}\s+(?:|CLICK HERE)/i > I think > body FOR_INSTANT_ACCESS /INSTANT ACCESS/i > is fine by itself. I

Re: Re: Re: Re: Re: [SAtalk] slooooow rules

2002-02-21 Thread Craig Hughes
> I don't want to spend many time making the patch, unless it goes immediately > into CVS, as keeping it sync with CVS for weeks/months is a nightmare... > If I have to do the fork&sync way, i'll fork everything and redesign ruleset > syntax to better fit my needs for the C version... Rules tend

Re: Re: Re: Re: Re: [SAtalk] slooooow rules

2002-02-21 Thread Arpi
Hi, > On Thu, 2002-02-21 at 13:42, Arpi wrote: > > when will it be implemented, or better: when will you accept such patch fo > r > > ruleset? (i cannot modify the perl code, as i don't know the perl languege > > nor the spamassassin core enough, but i could help making this optimzation > > to th

Re: Re: Re: Re: [SAtalk] slooooow rules

2002-02-21 Thread Craig Hughes
Ok, so this thread got my to go read through man perlre in a little more detail. I've found the following as a result: PerMsgStatus.pm uses $& and $', which apparently will cause *all* regex matching to be much slower program wide. I'll try to rewrite the one line on which that occurs; we shoul

Re: Re: Re: Re: [SAtalk] slooooow rules

2002-02-21 Thread Craig Hughes
Heh, yeah. My syntax would make it seem that it would allow that. And I agree that allowing that would be better. But allowing that would mean more coding ;) I'll probably do it anyway... C On Thu, 2002-02-21 at 14:20, Arpi wrote: > Hi, > > > On 21 February 2002, Craig Hughes said: > > > I

Re: Re: Re: [SAtalk] slooooow rules

2002-02-21 Thread Craig Hughes
This syntax makes the rule parse more complicated, given the way it works now. Though it is a little nicer because it makes it clearer that something like: rawbody A/rule1/ and header A /rule2/ will not work as expected. C On Thu, 2002-02-21 at 13:40, Greg Ward wrote: > On 21 February 200

Re: Re: Re: Re: [SAtalk] slooooow rules

2002-02-21 Thread Craig Hughes
On Thu, 2002-02-21 at 13:42, Arpi wrote: > when will it be implemented, or better: when will you accept such patch for > ruleset? (i cannot modify the perl code, as i don't know the perl languege > nor the spamassassin core enough, but i could help making this optimzation > to the ruleset) You ca

Re: Re: Re: Re: [SAtalk] slooooow rules

2002-02-21 Thread Greg Ward
On 21 February 2002, Arpi said: > anyway, i have a request: > could you add a new rule type, for plain text matches? > searching for a text string is always simpler and faster than for regexps, > and many of your regexps are such strings (/some words/i) and there will be > much more when start add

Re: Re: Re: Re: [SAtalk] slooooow rules

2002-02-21 Thread Arpi
Hi, > On 21 February 2002, Craig Hughes said: > > I had been thinking about creating a "multiple-rule" format for rules, > > where in order to match a rule, you would have to match a sequence of > > regexes, eg: > > > > rawbody ASCII_FORM_ENTRY /_{30,}/ > > and rawbody ASCII_FORM_ENTRY /[

Re: Re: Re: [SAtalk] slooooow rules

2002-02-21 Thread Greg Ward
On 21 February 2002, Craig Hughes said: > I had been thinking about creating a "multiple-rule" format for rules, > where in order to match a rule, you would have to match a sequence of > regexes, eg: > > rawbody ASCII_FORM_ENTRY /_{30,}/ > and rawbody ASCII_FORM_ENTRY /[^<][A-Za-z][A-Za-z]

Re: Re: Re: Re: [SAtalk] slooooow rules

2002-02-21 Thread Arpi
Hi, > I had been thinking about creating a "multiple-rule" format for rules, > where in order to match a rule, you would have to match a sequence of > regexes, eg: > > rawbody ASCII_FORM_ENTRY /_{30,}/ > and rawbody ASCII_FORM_ENTRY /[^<][A-Za-z][A-Za-z]+.{1,15}?\s+_{30,}/ > > the "and"

Re: Re: Re: [SAtalk] slooooow rules

2002-02-21 Thread Craig Hughes
I had been thinking about creating a "multiple-rule" format for rules, where in order to match a rule, you would have to match a sequence of regexes, eg: rawbody ASCII_FORM_ENTRY /_{30,}/ and rawbody ASCII_FORM_ENTRY /[^<][A-Za-z][A-Za-z]+.{1,15}?\s+_{30,}/ the "and" prefix on a rule mean

Re: [SAtalk] slooooow rules

2002-02-21 Thread Craig Hughes
On Thu, 2002-02-21 at 10:22, Arpi wrote: > Hi, > > > I've ran my C version through your really big spam collection at night, and > > filtered out 'slow' messages. Then I've checked which regexps makes them so > > slow (slow mean 5..25 secs/mail on p4 1.8ghz). > > more on this... > > > FOR_INSTA

Re: Re: Re: [SAtalk] slooooow rules

2002-02-21 Thread Arpi
Hi, > > > rawbody ASCII_FORM_ENTRY/[^<][A-Za-z][A-Za-z]+.{1,15}?\s+_{30,}/ > > [^<] means "any character except '<'". > anyway, it explains why is this regexp so slow :( > it partially matches at every character position of text, and only at the > end (_{30,}) turns out that bad match..

Re: [SAtalk] slooooow rules

2002-02-21 Thread Craig Hughes
Slightly more accurately, ^ as the *first* character inside [] means not. Later in the [] it means ^ C On Thu, 2002-02-21 at 10:42, Charlie Watts wrote: > On Thu, 21 Feb 2002, Arpi wrote: > > > rawbody ASCII_FORM_ENTRY/[^<][A-Za-z][A-Za-z]+.{1,15}?\s+_{30,}/ > > > > could someone pleas

Re: Re: [SAtalk] slooooow rules

2002-02-21 Thread Arpi
Hi, > On Thu, 21 Feb 2002, Arpi wrote: > > > rawbody ASCII_FORM_ENTRY/[^<][A-Za-z][A-Za-z]+.{1,15}?\s+_{30,}/ > > > > could someone please explain what does [^<] matches ? > > afaik ^ means beginning-of-line but it's strange in [] character array. > > so, what does ^ mean there? begin-of

Re: [SAtalk] slooooow rules

2002-02-21 Thread James Golovich
On Thu, 21 Feb 2002, Arpi wrote: > rawbody ASCII_FORM_ENTRY/[^<][A-Za-z][A-Za-z]+.{1,15}?\s+_{30,}/ > > could someone please explain what does [^<] matches ? > afaik ^ means beginning-of-line but it's strange in [] character array. > so, what does ^ mean there? begin-of-line or '^' char

Re: [SAtalk] slooooow rules

2002-02-21 Thread Arpi
Hi, > I've ran my C version through your really big spam collection at night, and > filtered out 'slow' messages. Then I've checked which regexps makes them so > slow (slow mean 5..25 secs/mail on p4 1.8ghz). more on this... > FOR_INSTANT_ACCESS: > /(?:CLICK HERE|).{0,20}\s+INSTANT\s+ACCESS.{0,

[SAtalk] slooooow rules

2002-02-20 Thread Arpi
Hi, I've ran my C version through your really big spam collection at night, and filtered out 'slow' messages. Then I've checked which regexps makes them so slow (slow mean 5..25 secs/mail on p4 1.8ghz). Most 'slow' mails have many (>1000) repeats of a single char (X...XXX