Hi folks,

sorry, a long mail and this on a friday...

I have thought quite a while over the new specification for the TAG. In
the new wording, the "path" is explicitly defined. Let's see an excerpt:

###

                TAG          =  full-stat-id  [full-dyn-id] (':' / SP)
                full-stat-id =  [path] progname
                path         =  path-part 1*(path-sep [path])
                path-part    = 1*VISUAL
                path-sep     = '/' / '\'
                progname     = 1*VISUAL
                proc-id      = 1*ALFANUM  ; recommended: number
                VISUAL       = ([a-zA-Z0-9...], excusing  '['
                SP           = %d32
###

An example from postfix was given. I have just taken a new example from
my logs, because I could verify this better:

###
Oct 31 10:01:51 ipx10102 postfix/smtpd[19782]: disconnect from
ASte-Genev-Bois-113-1-1-241.w81-50.abo.wanadoo.fr[81.50.79.241]
###

if you look at it, the path is just a partial path. Postfix is installed
at /usr/libexec/postfix and the file run actually is
/usr/libexec/postfix/smtpd.

Also, of course, the TAG vlaue could also have been an additional
designation and not an actual path.

Besides that, there is another fundamental issue with specifying the
path - that is, we assume that the OS provides pathes as specified. In
the ABNF, we already cover *nix vs. DOS pathes ('/' vs '\'). But there
are many more path representations in other OS's. I dug I bit in my
memories and can see there are other formats:

For example on VMS (ok, a bit outdated nowadays...), this is a valid
path:

DKA0:[MYDIR.SUBDIR1.SUBDIR2]MYFILE.TXT;1

I found a description of the nameing system at
http://www.djesys.com/vms/freevms/mentor/vms_path.html (I have not
visited any other page on that site and don't know what it is all about
- so be warned if you follow the link).

Also, on Unisys OS 1100 machines, a valid path looks like this:

sys$*data$.co$install

I have found no web reference to this system, so a quick intro: On OS
1100 (to the best of my knowledge still in widespread use), a file has
the format of qualifier*name. Then, inside a file you have an so-called
element, which is specified after a period. I guess it is nowadays
possible to include *nix like pathes inside the element names.

If we look at IBM's VMS, VSE, CMS and MVS we find more different path
notations...

Of course, under DOS/Windows it looks like "c:\bin\someprogram.exe".
Notice the colon. I have heard (but don't have experience) that on MAC,
colons also occur frequently in pathes.

In short: I don't think it is a good idea to specify how a path should
look like.

However, one essence remains: there should be a part of the tag that is
more or less STATIC and one part that is DYNAMIC. The static part
denotes the application emiting messages, the dynamic part a specific
instance of it.

I think we should settle with this. As such, I propose the following
stripped-down ABNF:

###
   ; The following line is NOT part of the TAG ABNF, but it is needed
   ; to specify the optional SP after the tag. It is taken from my
   ; full message ABNF. The other parts (TIMESTAMP...) are NOT defined
   ; below.
   HEADER          = TIMESTAMP SP HOSTNAME SP TAG [SP]

   TAG             = static-id  [full-dyn-id] [':'] ; 64 chars max
   static-id       = 1*VISUAL
   full-dyn-id     = '[' proc-id [thread-sep thread-id] ']'
   proc-id         = 1*ALFANUM  ; recommended: number
   thread-sep      = VISUAL / %d58     ; recommended: ",", or ':', or
'.'
   thread-id       = 1*ALFANUM  ; recommended: number
   VISUAL          = (%d33-57/%d59-126) ; all but SP and ":"

   LF              = %d10
   CR              = %d13
   SP              = %d32
   PRINTUSASCII    = %d33-126

   The TAG is a string of visible (printing) characters excluding SP,
   that MUST NOT exceed 64 characters in length. The first occurrence of
   a SP (space) will terminate the TAG field, but is not part of it. It
   is RECOMMENDED to terminate the TAG with a colon (':'), which if
   used, is part of the TAG.

   The TAG is used to denote the sender of the message. It MUST be in
   the syntax shown in the ABNF above.

   A typical example of a TAG is: (without the quotes)

   "/path/to/PROGNAME[123,456]:"

   Another example (from VMS) is: (without the quotes)

   "DKA0:[MYDIR.SUBDIR1.SUBDIR2]MYFILE.TXT;1[123,456]".

   Please note that in this example,
   "DKA0:[MYDIR.SUBDIR1.SUBDIR2]MYFILE.TXT;1" is the static-id while
   "[123,456]" is still the full-dyn-id. This shows that a receiver must
   be prepared for special characters like '[' to be present inside the
   static part.

   As a note to implementors: the begining of the full-dyn-id is not the
   first but the LAST occurence of '[' inside the tag and this ONLY if
   the tag ends in either "]" or "]:". If these conditions are not met,
   the '[' is part of the static-id.

   Systems that use both process-ID's and thead-IDs, SHOULD fill both
   the proc-id and the thread-part. For other systems it is RECOMMENDED
   to use the proc-id only.

   Receivers SHOULD, to be consistent with the format described in
   RFC3164, accept TAGs that terminate with a single colon, without a
   space following it. Then the colon is both the last character of that
   TAG, and the field separator with the next field (MSG).

   No specific format inside the tag is required. However, an emitor
   SHOULD use a consistent tag value.
###

This ABNF still provides the essentials and allows for pathes of all
kinds. The postfix sample would fit in neatly.

A similar issue is with the full-dyn-id part. Can we really assume that
a thread/process ID *always* fits into the above ABNF? I think it is
much more likely, but I am anyhow a bit concerned. I am of the general
position that one should not limit itself in ones options. Specifying
the dyn part as above would eventually create some issues with some
(strange) environments. On the other hands, I consider this to be very
unlikely and more or less a theoretical point - thus I left it in the
ABNF as above. I would appreciate comments especially on this. What does
the rest of the WG think?

There is one more issue with the thread-sep as specified in the ABNF
above. If we say it is VISUAL, parsing is not well defined. Let's look
at this fully-dyn-id sample:

[123]

Most obvious, the intension of the ABNF is that it should be parsed as
follows:
proc-id = 123; thread-sep = empty; thread-id = empty

However, I think I could also interpreted it as follows:
proc-id = 1; thread-sep = 2; thread-id = 3

This is, because thread-sep is VISUAL. If we collapse the ABNF a little,
that full-dyn-ip is effectively specified as

'[' 1*ALFANUM VISUAL 1*ALFANUM ']'

But ALFANUM is effectively a subset of VISUAL, so there is no way to
tell where the ALFANUM ends and the VISUAL begins in cases like above. I
think this needs to be clarified and changed in the ABNF.

I propose that we change the ABNF to replace thread-sep with this
definition:

   thread-sep      = ','|'.'

As there is no legacy to support, I think this won't hurt. Or is there
legacy to support (especially legacy that would not fit into this
definition)?

And one final comment about colons: we have several places in the ABNF
above where we allow colons (path name, thread-sep). Especially in path
names (DOS, MAC, VMS), colons seem to be used often. So if we really
intend to include path names inside the tag (which sounds like a good
idea), we probably need to drop the legacy compatibility rule that a
colon NOT followed by a SP will terminate the tag. Look at this:

"C:\bin\mwagent.exe[1234]:"

This can properly be parsed if we demand a SP after the tag. So we know
the first colon is part of the path (because it is not followed by SP)
while the last (followed by a SP not shown) is not.

With the current wording, that tag would just be parsed as "C:" and the
rest goes into the MSG part. I don't see any way to avoid this, instead
by making the SP after TAG a MUST.

So I think we need to make a tradeoff decision: we can either

a) allow colons in the path name
xor ;)
b) allow TAG NOT to be terminated by SP

Selecting
a) will break compliance for older clients (how many?)
b) will open up a can of (security?) bugs as I guess there are
   well enough implementors out there not caring about the
   restricted char

Both choices are not really good... I personally have a slight tendency
towards a), as I *feel* that the number of affected older senders is
limited (but I may be totally wrong). I also think it is "cleaner" from
an overall architectural point of view to separate ALL fields by SP and
don't make an exception for a single one. An argument for this is,
again, this can be the source for some program bugs, eventually even
security related ones (missed length check and a long message without
spaces immediately following the colon).

The above ABNF and wording is inconsistent in regard to the
"colon-issue".

Well, I think that's it for now. Looks like it gets ugly the more you
dig into detail...

Rainer


Reply via email to