On Mon, Feb 04, 2002 at 09:39:16AM -0500, Edward Fang wrote: > One of our users who is heavy into Debian Linux (Ben Collins - giving > credit) found a problem where SA would start tagging Bad RFC822 header > formatting into the headers if there was a tab/nospace in the Subject line. > He changed the regex for it, and it looks like it works. I'm submitting > this to the list for others to test as well. Not sure if the RFC specifies > allowing \t's in the Subject line, but at least this prevents it from > constantly mangling it. It appears several lists insert them into Subj > lines as well. . .
The RFC (2822) specifies that any char may appear in the subject header except CR and LF. Interestingly, the RFC seems to indicate that there is no whitespace required after a field name (ie: "Subject:testing" is perfectly valid). Appendix 5 specifies that any whitespace between field name and field body should be ignored: 2.2. Header Fields Header fields are lines composed of a field name, followed by a colon (":"), followed by a field body, and terminated by CRLF. A field name MUST be composed of printable US-ASCII characters (i.e., characters that have values between 33 and 126, inclusive), except colon. A field body may be composed of any US-ASCII characters, except for CR and LF. However, a field body may contain CRLF when used in header "folding" and "unfolding" as described in section 2.2.3. All field bodies MUST conform to the syntax described in sections 3 and 4 of this standard. 2.2.1. Unstructured Header Field Bodies Some field bodies in this standard are defined simply as "unstructured" (which is specified below as any US-ASCII characters, except for CR and LF) with no further restrictions. These are referred to as unstructured field bodies. Semantically, unstructured field bodies are simply to be treated as a single line of characters with no further processing (except for header "folding" and "unfolding" as described in section 2.2.3). 2.2.2. Structured Header Field Bodies Some field bodies in this standard have specific syntactical structure more restrictive than the unstructured field bodies described above. These are referred to as "structured" field bodies. Structured field bodies are sequences of specific lexical tokens as described in sections 3 and 4 of this standard. Many of these tokens are allowed (according to their syntax) to be introduced or end with comments (as described in section 3.2.3) as well as the space (SP, ASCII value 32) and horizontal tab (HTAB, ASCII value 9) characters (together known as the white space characters, WSP), and those WSP characters are subject to header "folding" and "unfolding" as described in section 2.2.3. Semantic analysis of structured field bodies is given along with their syntax. 3.6.5. Informational fields The informational fields are all optional. The "Keywords:" field contains a comma-separated list of one or more words or quoted-strings. The "Subject:" and "Comments:" fields are unstructured fields as defined in section 2.2.1, and therefore may contain text or folding white space. subject = "Subject:" unstructured CRLF [...] A.5. White space, comments, and other oddities White space, including folding white space, and comments can be inserted between many of the tokens of fields. Taking the example from A.1.3, white space and comments can be inserted into all of the fields. ---- From: Pete(A wonderful \) chap) <pete(his account)@silly.test(his host)> To:A Group(Some people) :Chris Jones <c@(Chris's host.)public.example>, [EMAIL PROTECTED], John <[EMAIL PROTECTED]> (my dear friend); (the end of the group) Cc:(Empty list)(start)Undisclosed recipients :(nobody(that I know)) ; Date: Thu, 13 Feb 1969 23:32 -0330 (Newfoundland Time) Message-ID: <[EMAIL PROTECTED]> Testing. ---- The above example is aesthetically displeasing, but perfectly legal. Note particularly (1) the comments in the "From:" field (including one that has a ")" character appearing as part of a quoted-pair); (2) the white space absent after the ":" in the "To:" field as well as the comment and folding white space after the group name, the special character (".") in the comment in Chris Jones's address, and the folding white space before and after "[EMAIL PROTECTED],"; (3) the multiple and nested comments in the "Cc:" field as well as the comment immediately following the ":" after "Cc"; (4) the folding white space (but no comments except at the end) and the missing seconds in the time of the date field; and (5) the white space before (but not within) the identifier in the "Message-ID:" field. -- Randomly Generated Tagline: "Workaround/Solution: Disable Active Scripting and never turn it on. Better, do not use IE in hostile environments such as the internet." - Georgi Guninski in a posting to Bugtraq about yet another bug in IE _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk