I recently noticed a couple of cases where SA (3.1.4 or earlier) would take over a minute (instead of few seconds) to check a 500 kB message. Investigation reavealed that cases have one thing in common: these were all message/partial chunks of a longish transfer of some document or other data. Moreover, most of these cases were hitting random sets of SARE or baseline rules, yielding false positives.
In case someone would suggest that Content-Type: message/partial should be banned outright - well, it is a policy decision, and if allowed, should not bring SA to its knees on a 0.5 MB message. Here is one example where a command-line 'spamassassin -t -D' would run for 68 seconds. Timestamping each debug line produces the following top-10 lines - sorted by elapsed time, first column is time in seconds for this line to appear after a previous one: 1.935 dbg: rules: ran body rule SARE_RMML_Stock1 ======> got hit: "0TC" 2.204 dbg: rules: ran body rule __SARE_SPEC_LRD_COST4 ======> got hit: "134" 3.695 dbg: rules: ran body rule SARE_RMML_Stock9 ======> got hit: "0il" 3.976 dbg: rules: ran body rule __NONEMPTY_BODY ======> got hit: "i" 4.021 dbg: rules: running raw-body-text per-line regexp tests; score ... 6.397 dbg: rules: ran body rule FB_NOT_SEX ======> got hit: " Sjx" 8.225 dbg: bayes: tok_get_all: token count: 37175 8.254 dbg: rules: ran body rule __SARE_SPEC_LRD_COST5 ======> got hit: "169" 9.682 dbg: rules: ran body rule __SARE_SPEC_LRD_COST6 ======> got hit: "218" 11.999 dbg: rules: running body-text per-line regexp tests; score so far=2.501 and another example: 2.396 dbg: rules: ran body rule DISGUISE_PORN_MUNDANE ======> got hit: "b0y" 2.424 dbg: rules: ran body rule __SARE_SPEC_LRD_COST4 ======> got hit: "134" 2.627 dbg: bayes: tok_get_all: token count: 36631 3.421 dbg: rules: running body-text per-line regexp tests; score so far=0.203 3.826 dbg: rules: ran body rule SARE_RMML_Stock9 ======> got hit: "0Il" 4.181 dbg: rules: running raw-body-text per-line regexp tests; score ... 4.265 dbg: rules: ran body rule FB_NOT_SEX ======> got hit: " S8X" 8.113 dbg: rules: ran body rule FUZZY_XPILL ======> got hit: "XoNOgX" 9.308 dbg: rules: ran body rule __SARE_SPEC_LRD_COST5 ======> got hit: "169" 9.945 dbg: rules: ran body rule __SARE_SPEC_LRD_COST6 ======> got hit: "218" I know some of these are SARE rulesets, but some are baseline rules or bayes token parsing. Here is a relevant section/sample of one of these messages: MIME-Version: 1.0 Content-Type: message/partial; total=22; id="[EMAIL PROTECTED]"; number=21 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2869 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869 f6idzxqa608aID8+YhwNSQwBpIrboHA0/zPfOP26mB6eONz70Xl12DwGVnAPemaaKaJyQk5ZKUwg VC0sGYHLd543cICNa1piu8YgRJR0EaEK7GNVXvFSriat5dZwj7PNzQuOTO030bra7tBjROxbrVYR XFStjnugVkyH27zqrvUdUsHYnLaVLdUuAxWH51QDV9/kc6vtIURcdUbthPszq12lj7Lt7rMAtVX7 So the problem is that these base64-encoded lines in a message/partial chunk are treated as obfuscated text, which is very slow, and produces almost random hits on various rules. It also places some burden on SQL server (bayes: tok_get_all: token count: 37175). Somewhat similar mail cases that also hit various obfuscation rules because of its UU-encoding being mistaken for a plain text, is mail with attachments produced by Microsoft Office Outlook where user has the following setting chosen: Tools -> Options -> Mail Format -> Internet format: plain text options: (YES) Encode attachments in UUENCODE format when sending a plain text message It would be nice if such encodings were recognized and at least prevent rules that expect plain text from running and/or producing false hits. Mark