Hi, I've ran my C version through your really big spam collection at night, and filtered out 'slow' messages. Then I've checked which regexps makes them so slow (slow mean 5..25 secs/mail on p4 1.8ghz).
Most 'slow' mails have many (>1000) repeats of a single char (XXXXXXXXXXXXXXXXXXXXXXXXX...XXXXXXXXXXXXXXXXXX) or tab+newline pair. the XXXXXXXXX one triggers: slow rule ASCII_FORM_ENTRY: 0.372282s slow rule LINE_OF_YELLING: 10.299467s the repeat(tab+newline) one: slow rule FOR_INSTANT_ACCESS: 7.514345s let's see them. FOR_INSTANT_ACCESS: /(?:CLICK HERE|).{0,20}\s+INSTANT\s+ACCESS.{0,20}\s+(?:|CLICK HERE)/i I think its author wanted to match "CLICK HERE * INSTANT ACCESS" and "INSTANT ACCESS * CLICK HERE" but in a singel common regexp. as it starts with a (|) it cannot be searched fast enough by the regexp matcher. i think, splitting this rule to 2 rules would speed up this check a LOT. Note, that this regexp is always much slower than other regexps, this mail just triggers it to slow to hell. LINE_OF_YELLING: /^[A-Z0-9\$\.,\'\!\?\s]{20,}[A-Z\$\.,\'\!\?]{5,}[A-Z0-9\$\.,\'\!\?\s]{20,}$/ trivial, it doesn't have single fixed first char, so search is slow. either rewritting this check in C, or using the 'study' featue of PCRE could help. i'll try. rawbody ASCII_FORM_ENTRY /[^<][A-Za-z][A-Za-z]+.{1,15}?\s+_{30,}/ same as above. there are a few rules starting with character set instead of single fixed char, making regexp matching much slower. maybe rethink these or splitting to several rules could help. A'rpi / Astral & ESP-team -- Developer of MPlayer, the Movie Player for Linux - http://www.MPlayerHQ.hu _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk