On 9 Jun 2016, at 1:40, Olivier wrote:
Mark London <m...@psfc.mit.edu> writes:
On 6/8/2016 1:20 PM, John Hardin wrote:
On Wed, 8 Jun 2016, Mark London wrote:
Hi - We received an email with several large postscript
attachments,
and the content type was "text/plain". This caused our
spamassassin
Sorry to jump in, but should SA trust the content-type or the file(1)
type, or should try to compare both and do something if they
missmatch?
No.
The root of this problem isn't a type mismatch. PostScript *IS* (or at
least *can be*) plain text. If my recollection is correct, PS was
originally specified to use "Base85" encoding for anything that went
outside the "ascii85" (33-117) subset of ASCII, which just happened to
be the digits used in Base85 encoding. Many Unix-ish machines have the
'atob' and 'btoa' utilities on them to do that encoding and decoding. As
long as a proper mail-safe transfer encoding is used (Base64 or QP)
there is no limit on the decoded "line" length in text/plain (Thanks,
Microsoft!) of PostScript, so an unanchored and/or imprudently loose
regular expression can end up doing a LOT of false starts in a big chunk
of Postscript that claims correctly to be text/plain with (probably) Q-P
encoding used only to soft-break it into transport-safe lines. The root
of the problem is using unanchored and/or imprudently loose regular
expression rules. If it's not an innocent PS today because there was a
type mismatch, it will be correctly-typed HTML tomorrow.
A handy heuristic: if a rule does not start with '^' immediately
followed by something restrictive (even '^.{0,80}' followed by a string
of literals) or has '.*' anywhere, it's risky.
This is probably most critical with 'rawbody' rules and rules that use
the multiline option. It is important to understand that "rawbody" isn't
the body parts of 'full' (pristine RFC2822 format with any CTE) but
rather the text/* body parts with any Content-Transfer-Encoding
transformation reversed. Due to the odd way SA deals with line ends
(there are WONTFIX bugs on it and probably a mention on some wiki
page...) you basically have to use the multiline option to anchor to
line-ends, and that can inadvertently land you in trouble.