On 9 Jun 2016, at 1:40, Olivier wrote:

Mark London <m...@psfc.mit.edu> writes:

On 6/8/2016 1:20 PM, John Hardin wrote:
On Wed, 8 Jun 2016, Mark London wrote:
Hi - We received an email with several large postscript attachments,
and the content type was "text/plain".   This caused our
spamassassin

Sorry to jump in, but should SA trust the content-type or the file(1)
type, or should try to compare both and do something if they missmatch?

No.

The root of this problem isn't a type mismatch. PostScript *IS* (or at least *can be*) plain text. If my recollection is correct, PS was originally specified to use "Base85" encoding for anything that went outside the "ascii85" (33-117) subset of ASCII, which just happened to be the digits used in Base85 encoding. Many Unix-ish machines have the 'atob' and 'btoa' utilities on them to do that encoding and decoding. As long as a proper mail-safe transfer encoding is used (Base64 or QP) there is no limit on the decoded "line" length in text/plain (Thanks, Microsoft!) of PostScript, so an unanchored and/or imprudently loose regular expression can end up doing a LOT of false starts in a big chunk of Postscript that claims correctly to be text/plain with (probably) Q-P encoding used only to soft-break it into transport-safe lines. The root of the problem is using unanchored and/or imprudently loose regular expression rules. If it's not an innocent PS today because there was a type mismatch, it will be correctly-typed HTML tomorrow.

A handy heuristic: if a rule does not start with '^' immediately followed by something restrictive (even '^.{0,80}' followed by a string of literals) or has '.*' anywhere, it's risky.

This is probably most critical with 'rawbody' rules and rules that use the multiline option. It is important to understand that "rawbody" isn't the body parts of 'full' (pristine RFC2822 format with any CTE) but rather the text/* body parts with any Content-Transfer-Encoding transformation reversed. Due to the odd way SA deals with line ends (there are WONTFIX bugs on it and probably a mention on some wiki page...) you basically have to use the multiline option to anchor to line-ends, and that can inadvertently land you in trouble.

Reply via email to