Re: Is this Received header correctly formatted?

List Mail User 16 Mar 2005 18:47:44 -0000

>To: Loren Wilton <[EMAIL PROTECTED]>
>Cc: SpamAssassin Mailing List <[EMAIL PROTECTED]>
>Subject: Re: Is this Received header correctly formatted?
>
>
>Loren Wilton wrote:
>> Received: from ar39.lsanca2-4.16.241.28.lsanca2.elnk.dsl.genuity.net
>> ([4.16.241.28] helo=watson1)
>>  by pop-a065d23.pas.sa.earthlink.net with smtp (Exim 3.33 #1)
>>  id 1DBKRe-0000Kp-00; Tue, 15 Mar 2005 14:23:22 -0800
>> 
>> 1) Is "stmp" in lower case valid, or should it have been STMP?
>> 2) Is it valid to have the (Exim etc) stuff between 'stmp' and 'id'?
>> 3) Anything else that may be off the mark?
>
>The robustness principle says that you should be strict in what you send 
>and liberal in what you accept. From that perspective, it's not a 
>strictly conformant header, but its not broken enough for somebody to 
>refuse to parse it.
>
>In answer to your questions:
>
>  1) the spec calls for uppercase
>
>  2) header data in parenthesis is comment data. comments are supposed
>     to be ~allowed anywhere that whitespace is allowed (this rule is
>     actually documented in RFC2822, which governs header fields). with
>     that in mind, yes, it's fine there.
>
>  3) the "helo=" stuff isn't conformant
>
>
>Here's the BNF notation for the Received header as provided in RFC2821:
>
>| Time-stamp-line = "Received:" FWS Stamp <CRLF>
>|
>| Stamp = From-domain By-domain Opt-info ";"  FWS date-time
>|
>|       ; where "date-time" is as defined in [32]
>|       ; but the "obs-" forms, especially two-digit
>|       ; years, are prohibited in SMTP and MUST NOT be used.
>|
>| From-domain = "FROM" FWS Extended-Domain CFWS
>|
>| By-domain = "BY" FWS Extended-Domain CFWS
>|
>| Extended-Domain = Domain /
>|            ( Domain FWS "(" TCP-info ")" ) /
>|            ( Address-literal FWS "(" TCP-info ")" )
>|
>| TCP-info = Address-literal / ( Domain FWS Address-literal )
>|       ; Information derived by server from TCP connection
>|       ; not client EHLO.
>|
>| Opt-info = [Via] [With] [ID] [For]
>|
>| Via = "VIA" FWS Link CFWS
>|
>| With = "WITH" FWS Protocol CFWS
>|
>| ID = "ID" FWS String / msg-id CFWS
>|
>| For = "FOR" FWS 1*( Path / Mailbox ) CFWS
>|
>| Link = "TCP" / Addtl-Link
>| Addtl-Link = Atom
>|       ; Additional standard names for links are registered with the
>|       ; Internet Assigned Numbers Authority (IANA).  "Via" is
>|       ; primarily of value with non-Internet transports.  SMTP
>|       ; servers SHOULD NOT use unregistered names.
>| Protocol = "ESMTP" / "SMTP" / Attdl-Protocol
>| Attdl-Protocol = Atom
>|     ; Additional standard names for protocols are registered with the
>|     ; Internet Assigned Numbers Authority (IANA).  SMTP servers
>|     ; SHOULD NOT use unregistered names.
>
>
>-- 
>Eric A. Hall                                       http://www.ehsco.com/
>Internet Core Protocols         http://www.oreilly.com/catalog/coreprot/
>


        Eric, I think you hit all the salient points, but you did miss one
important one. Earlier in RFC2821, before the section you quoted is:

RFC2821 Section 2.4
" ...
The metalinguistic notation used in this document corresponds to the
"Augmented BNF" used in other Internet mail system documents.  The
reader who is not familiar with that syntax should consult the ABNF
specification [8].  Metalanguage terms used in running text are
surrounded by pointed brackets (e.g., <CRLF>) for clarity."

Where reference [8] is:
"[8]  Crocker, D. and P. Overell, Eds., "Augmented BNF for Syntax
Specifications: ABNF", RFC 2234, November 1997."

and in RFC 2234

RFC2234 Section 2.3
"2.3  Terminal Values

   Rules resolve into a string of terminal values, sometimes called
   characters.  In ABNF a character is merely a non-negative integer.
   In certain contexts a specific mapping (encoding) of values into a
   character set (such as ASCII) will be specified.

   Terminals are specified by one or more numeric characters with the
   base interpretation of those characters indicated explicitly.  The
   following bases are currently defined:

        b           =  binary

        d           =  decimal

        x           =  hexadecimal

   Hence:

        CR          =  %d13

        CR          =  %x0D

   respectively specify the decimal and hexadecimal representation of
   [US-ASCII] for carriage return.

   A concatenated string of such values is specified compactly, using a
   period (".") to indicate separation of characters within that value.
   Hence:

        CRLF        =  %d13.10

   ABNF permits specifying literal text string directly, enclosed in
   quotation-marks.  Hence:

        command     =  "command string"

   Literal text strings are interpreted as a concatenated set of
   printable characters.

        NOTE:     ABNF strings are case-insensitive and
                  the character set for these strings is us-ascii.

   Hence:

        rulename = "abc"

   and:

        rulename = "aBc"

   will match "abc", "Abc", "aBc", "abC", "ABc", "aBC", "AbC" and "ABC".

                To specify a rule which IS case SENSITIVE,
                   specify the characters individually.

   For example:

        rulename    =  %d97 %d98 %d99

   or

        rulename    =  %d97.98.99

   will match only the string which comprises only lowercased
   characters, abc."

(Note: The missing '%'s in the final example are in the spec, not my typo)

Also within RFC2822 has the same reference and the particular listing of:

RFC2822 Section 1.2.2
"1.2.2. Syntactic notation

   This standard uses the Augmented Backus-Naur Form (ABNF) notation
   specified in [RFC2234] for the formal definitions of the syntax of
   messages.  Characters will be specified either by a decimal value
   (e.g., the value %d65 for uppercase A and %d97 for lowercase A) or by
   a case-insensitive literal value enclosed in quotation marks (e.g.,
   "A" for either uppercase or lowercase A).  See [RFC2234] for the full
   description of the notation."

        In other words, lowercase is conformant. and your first point is
not correct (though all the examples do show uppercase).  However, you are
completely correct that the "helo=" is flat out wrong, but with a slight
variation, and it becomes something like "(watson1 [4.16.241.28])" which
is not only conformant, but is the the typical behavior or both sendmail
and postfix.  As an example: the email containing your message had the header
from my first internal server of;
Received: from mail.apache.org (hermes.apache.org [209.237.227.199]).
        by mailhub.plectere.com (Postfix) with SMTP id 0BEFA678C
        for <[EMAIL PROTECTED]>; Wed, 16 Mar 2005 01:49:51 -0800 (PST)


        Paul Shupak
        [EMAIL PROTECTED]

P.S.  Could whomever maintains this list please try to settle on one format
for the list's name - today's messages are using

SpamAssassin Mailing List <[EMAIL PROTECTED]>

a couple of days ago the format changed to:

"[EMAIL PROTECTED] apache. org" <users@spamassassin.apache.org>

and I already have to special case a half dozen variants for when people
put SA output in their messages and my filters "see" high scores (despite
the various whitelistings, special cases and other stuff to handle the list).
Any of the example scripts distributed with the code all fail when people
quote output - So I have additional checks to try to prevent bounces that
add additional tests for the list in either "To:", or "Cc:" lines *and*
"From:" lines, but every time the delivering servers change or the list
description changes, I have to add more cases.

Re: Is this Received header correctly formatted?

Reply via email to