On Mon, Feb 04, 2002 at 09:39:16AM -0500, Edward Fang wrote:
> One of our users who is heavy into Debian Linux (Ben Collins - giving
> credit) found a problem where SA would start tagging Bad RFC822 header
> formatting into the headers if there was a tab/nospace in the Subject line.
> He changed the regex for it, and it looks like it works.  I'm submitting
> this to the list for others to test as well.   Not sure if the RFC specifies
> allowing \t's in the Subject line, but at least this prevents it from
> constantly mangling it.  It appears several lists insert them into Subj
> lines as well. . . 

The RFC (2822) specifies that any char may appear in the subject header
except CR and LF.  Interestingly, the RFC seems to indicate that there
is no whitespace required after a field name (ie: "Subject:testing"
is perfectly valid).  Appendix 5 specifies that any whitespace between
field name and field body should be ignored:

2.2. Header Fields

   Header fields are lines composed of a field name, followed by a colon
   (":"), followed by a field body, and terminated by CRLF.  A field
   name MUST be composed of printable US-ASCII characters (i.e.,
   characters that have values between 33 and 126, inclusive), except
   colon.  A field body may be composed of any US-ASCII characters,
   except for CR and LF.  However, a field body may contain CRLF when
   used in header "folding" and  "unfolding" as described in section
   2.2.3.  All field bodies MUST conform to the syntax described in
   sections 3 and 4 of this standard.

2.2.1. Unstructured Header Field Bodies

   Some field bodies in this standard are defined simply as
   "unstructured" (which is specified below as any US-ASCII characters,
   except for CR and LF) with no further restrictions.  These are
   referred to as unstructured field bodies.  Semantically, unstructured
   field bodies are simply to be treated as a single line of characters
   with no further processing (except for header "folding" and
   "unfolding" as described in section 2.2.3).

2.2.2. Structured Header Field Bodies

   Some field bodies in this standard have specific syntactical
   structure more restrictive than the unstructured field bodies
   described above. These are referred to as "structured" field bodies.
   Structured field bodies are sequences of specific lexical tokens as
   described in sections 3 and 4 of this standard.  Many of these tokens
   are allowed (according to their syntax) to be introduced or end with
   comments (as described in section 3.2.3) as well as the space (SP,
   ASCII value 32) and horizontal tab (HTAB, ASCII value 9) characters
   (together known as the white space characters, WSP), and those WSP
   characters are subject to header "folding" and "unfolding" as
   described in section 2.2.3.  Semantic analysis of structured field
   bodies is given along with their syntax.

3.6.5. Informational fields

   The informational fields are all optional.  The "Keywords:" field
   contains a comma-separated list of one or more words or
   quoted-strings. The "Subject:" and "Comments:" fields are
   unstructured fields as defined in section 2.2.1, and therefore may
   contain text or folding white space.

subject         =       "Subject:" unstructured CRLF
[...]

A.5. White space, comments, and other oddities

   White space, including folding white space, and comments can be
   inserted between many of the tokens of fields.  Taking the example
   from A.1.3, white space and comments can be inserted into all of the
   fields.

----
From: Pete(A wonderful \) chap) <pete(his account)@silly.test(his host)>
To:A Group(Some people)
     :Chris Jones <c@(Chris's host.)public.example>,
         [EMAIL PROTECTED],
  John <[EMAIL PROTECTED]> (my dear friend); (the end of the group)
Cc:(Empty list)(start)Undisclosed recipients  :(nobody(that I know))  ;
Date: Thu,
      13
        Feb
          1969
      23:32
               -0330 (Newfoundland Time)
Message-ID:              <[EMAIL PROTECTED]>

Testing.
----

   The above example is aesthetically displeasing, but perfectly legal.
   Note particularly (1) the comments in the "From:" field (including
   one that has a ")" character appearing as part of a quoted-pair); (2)
   the white space absent after the ":" in the "To:" field as well as
   the comment and folding white space after the group name, the special
   character (".") in the comment in Chris Jones's address, and the
   folding white space before and after "[EMAIL PROTECTED],"; (3) the
   multiple and nested comments in the "Cc:" field as well as the
   comment immediately following the ":" after "Cc"; (4) the folding
   white space (but no comments except at the end) and the missing
   seconds in the time of the date field; and (5) the white space before
   (but not within) the identifier in the "Message-ID:" field.


-- 
Randomly Generated Tagline:
"Workaround/Solution:
 Disable Active Scripting and never turn it on.
 Better, do not use IE in hostile environments such as the internet."
         - Georgi Guninski in a posting to Bugtraq about yet another bug in IE

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to