Hi, > I've ran my C version through your really big spam collection at night, and > filtered out 'slow' messages. Then I've checked which regexps makes them so > slow (slow mean 5..25 secs/mail on p4 1.8ghz).
more on this... > FOR_INSTANT_ACCESS: > /(?:CLICK HERE|).{0,20}\s+INSTANT\s+ACCESS.{0,20}\s+(?:|CLICK HERE)/i > > I think its author wanted to match "CLICK HERE * INSTANT ACCESS" and > "INSTANT ACCESS * CLICK HERE" but in a singel common regexp. anyway it's bad... it matches single " INSTANT ACCESS " without having "CLICK HERE" before or after it... so that part of regexp is useless and just slows things down a lot. faster alternative: body FOR_INSTANT_ACCESS /\sINSTANT\s+ACCESS.{0,20}\s+/i correct me if i'm wrong, i'm still newbie in regexp world :) > LINE_OF_YELLING: > /^[A-Z0-9\$\.,\'\!\?\s]{20,}[A-Z\$\.,\'\!\?]{5,}[A-Z0-9\$\.,\'\!\?\s]{20,}$/ it only slows down mails with lots of uppercase chars, so it isn't problem. (got total 8 slow checks, but they took total 1min20 secs!) anyway optimizing it in C using a state-machine could help. the main point of this rule: there must be at least 20 uppercase chars without lowercase between them, and at least one uppercase word longer than 5 chars. easy to implement in C. ASCII_FORM_ENTRY: not an easy rule. rawbody ASCII_FORM_ENTRY /[^<][A-Za-z][A-Za-z]+.{1,15}?\s+_{30,}/ could someone please explain what does [^<] matches ? afaik ^ means beginning-of-line but it's strange in [] character array. so, what does ^ mean there? begin-of-line or '^' character? i think it's beg-of-line, as PCRE couldn't optimize this regexp with possible-first-chars-table. then we should split this to 2 rules. it is really slow at too many mails. (i've got 11687 slow (took longer than 1ms) checks running on your spam coll.) the remaining 2 slow rules are: PORN_3 MSG_ID_ADDED_BY_MTA_2 PORN_3 begins with double (?: | ), MSG_ID_ADDED_BY_MTA_2 partially matches every headers (which has Message-Id: field) causing regexp search to be slow. i have no idea how to speed up these. other regexps are rare or fast enough. Note: by changing only FOR_INSTANT_ACCESS as described above, i've got 45mins->29mins (~30%) speedup. so, it DOES worth to optimize/verify regexps! A'rpi / Astral & ESP-team -- Developer of MPlayer, the Movie Player for Linux - http://www.MPlayerHQ.hu _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk