Re: Rule for subjects that start with a whitespace

jdow Sat, 06 Aug 2005 02:57:16 -0700

From: "Ralph Seichter" <[EMAIL PROTECTED]>
> jdow wrote:
> 
>  > > 2.2. Header Fields
>  > > Header fields are lines composed of a field name, followed by a
>  > > colon (":"), followed by a field body, and terminated by CRLF.
>  > > A field name MUST be composed of printable US-ASCII characters
>  > > (i.e., characters that have values between 33 and 126,
>  >                                              ^^ NOTE
>  >
>  > > inclusive), except colon. A field body may be composed of any
>  > > US-ASCII characters, except for CR and LF. [...]
>  >
>  > NOTE: Character 32 is space. Character 33 is !. The subject does NOT
>  > begin with the space character. It begins with the first character
>  > past the space.
> 
> Perhaps you misread the RFC excerpt a bit? only the field name (!)
> must be composed of characters between 33 and 126. The definition


No - zero or more spaces are ignored with the first real character
being "!" through "~". For the rest of the message the space
character is allowed.

>    subject = "Subject:" unstructured CRLF
> 
> implies that, as far as I understand, the field body starts with the
> character immediately after the colon.

As long as that first character is not a space. (Arguably Outlook
Express gets it wrong presuming any character past the first space
is part of the subject. However, for OE I believe the subject header
can be either of "Subject:" or "Subject: ". The latter one is used
if matched otherwise the former one is used.

>  > Now, as to how SpamAssassin parses the Subject field is open for
>  > question. It appears a lot of rules seem to start presuming zero
>  > or more blank characters followed by the real search string.
> 
> As I wrote before: I believe that many software products dealing
> with email assume that the field body starts with the first non-
> whitespace character after zero or more whitespaces, or that they
> make use of functions like trim() to remove any leading/trailing
> whitespaces as they see fit, i.e. when storing or displaying
> messages. I don't know if checking for "surplus" whitespaces in
> field bodies has a realistic chance of success.

Darned few presume ANY first character is part of the body of the
subject. Most, in my experience, skip at least the first one. Often
(usually?) they will skip all space characters following the colon
until the first non-space character.

I've never run across an email program that treats "Subject: Spoo"
as having a subject body, for presentation to the user, of " Spoo".
I've run across many that will treat "Subject:    Spoo" as "   Spoo"
for presentation to the user. (That many may be most.) Those which
do not treat it as having a subject of "Spoo", instead. I have also
noticed that all the email programs I've played with accept the line
"Subject:Spoo" as having a subject of "Spoo".

This seems to be the reading they have taken on the quoted three
paragraphs. I take 2.2 as defining the rule and 2.2.2 as a subset
of that definition that is a trifle ambiguous. Certainly "Subject:Spoo"
is legal. It is unconventional. "Subject: Spoo" is the general convention.
And "Subject:  Spoo" is open to interpretation regarding whether or not
that second space is part of the subject. The first space is not by
paragraph 2.2.

{^_^}
        (My that's a lot of slow moving tasty creatures. JMS would be
        proud.)

Re: Rule for subjects that start with a whitespace

Reply via email to