Hi folks, sorry, a long mail and this on a friday...
I have thought quite a while over the new specification for the TAG. In the new wording, the "path" is explicitly defined. Let's see an excerpt: ### TAG = full-stat-id [full-dyn-id] (':' / SP) full-stat-id = [path] progname path = path-part 1*(path-sep [path]) path-part = 1*VISUAL path-sep = '/' / '\' progname = 1*VISUAL proc-id = 1*ALFANUM ; recommended: number VISUAL = ([a-zA-Z0-9...], excusing '[' SP = %d32 ### An example from postfix was given. I have just taken a new example from my logs, because I could verify this better: ### Oct 31 10:01:51 ipx10102 postfix/smtpd[19782]: disconnect from ASte-Genev-Bois-113-1-1-241.w81-50.abo.wanadoo.fr[81.50.79.241] ### if you look at it, the path is just a partial path. Postfix is installed at /usr/libexec/postfix and the file run actually is /usr/libexec/postfix/smtpd. Also, of course, the TAG vlaue could also have been an additional designation and not an actual path. Besides that, there is another fundamental issue with specifying the path - that is, we assume that the OS provides pathes as specified. In the ABNF, we already cover *nix vs. DOS pathes ('/' vs '\'). But there are many more path representations in other OS's. I dug I bit in my memories and can see there are other formats: For example on VMS (ok, a bit outdated nowadays...), this is a valid path: DKA0:[MYDIR.SUBDIR1.SUBDIR2]MYFILE.TXT;1 I found a description of the nameing system at http://www.djesys.com/vms/freevms/mentor/vms_path.html (I have not visited any other page on that site and don't know what it is all about - so be warned if you follow the link). Also, on Unisys OS 1100 machines, a valid path looks like this: sys$*data$.co$install I have found no web reference to this system, so a quick intro: On OS 1100 (to the best of my knowledge still in widespread use), a file has the format of qualifier*name. Then, inside a file you have an so-called element, which is specified after a period. I guess it is nowadays possible to include *nix like pathes inside the element names. If we look at IBM's VMS, VSE, CMS and MVS we find more different path notations... Of course, under DOS/Windows it looks like "c:\bin\someprogram.exe". Notice the colon. I have heard (but don't have experience) that on MAC, colons also occur frequently in pathes. In short: I don't think it is a good idea to specify how a path should look like. However, one essence remains: there should be a part of the tag that is more or less STATIC and one part that is DYNAMIC. The static part denotes the application emiting messages, the dynamic part a specific instance of it. I think we should settle with this. As such, I propose the following stripped-down ABNF: ### ; The following line is NOT part of the TAG ABNF, but it is needed ; to specify the optional SP after the tag. It is taken from my ; full message ABNF. The other parts (TIMESTAMP...) are NOT defined ; below. HEADER = TIMESTAMP SP HOSTNAME SP TAG [SP] TAG = static-id [full-dyn-id] [':'] ; 64 chars max static-id = 1*VISUAL full-dyn-id = '[' proc-id [thread-sep thread-id] ']' proc-id = 1*ALFANUM ; recommended: number thread-sep = VISUAL / %d58 ; recommended: ",", or ':', or '.' thread-id = 1*ALFANUM ; recommended: number VISUAL = (%d33-57/%d59-126) ; all but SP and ":" LF = %d10 CR = %d13 SP = %d32 PRINTUSASCII = %d33-126 The TAG is a string of visible (printing) characters excluding SP, that MUST NOT exceed 64 characters in length. The first occurrence of a SP (space) will terminate the TAG field, but is not part of it. It is RECOMMENDED to terminate the TAG with a colon (':'), which if used, is part of the TAG. The TAG is used to denote the sender of the message. It MUST be in the syntax shown in the ABNF above. A typical example of a TAG is: (without the quotes) "/path/to/PROGNAME[123,456]:" Another example (from VMS) is: (without the quotes) "DKA0:[MYDIR.SUBDIR1.SUBDIR2]MYFILE.TXT;1[123,456]". Please note that in this example, "DKA0:[MYDIR.SUBDIR1.SUBDIR2]MYFILE.TXT;1" is the static-id while "[123,456]" is still the full-dyn-id. This shows that a receiver must be prepared for special characters like '[' to be present inside the static part. As a note to implementors: the begining of the full-dyn-id is not the first but the LAST occurence of '[' inside the tag and this ONLY if the tag ends in either "]" or "]:". If these conditions are not met, the '[' is part of the static-id. Systems that use both process-ID's and thead-IDs, SHOULD fill both the proc-id and the thread-part. For other systems it is RECOMMENDED to use the proc-id only. Receivers SHOULD, to be consistent with the format described in RFC3164, accept TAGs that terminate with a single colon, without a space following it. Then the colon is both the last character of that TAG, and the field separator with the next field (MSG). No specific format inside the tag is required. However, an emitor SHOULD use a consistent tag value. ### This ABNF still provides the essentials and allows for pathes of all kinds. The postfix sample would fit in neatly. A similar issue is with the full-dyn-id part. Can we really assume that a thread/process ID *always* fits into the above ABNF? I think it is much more likely, but I am anyhow a bit concerned. I am of the general position that one should not limit itself in ones options. Specifying the dyn part as above would eventually create some issues with some (strange) environments. On the other hands, I consider this to be very unlikely and more or less a theoretical point - thus I left it in the ABNF as above. I would appreciate comments especially on this. What does the rest of the WG think? There is one more issue with the thread-sep as specified in the ABNF above. If we say it is VISUAL, parsing is not well defined. Let's look at this fully-dyn-id sample: [123] Most obvious, the intension of the ABNF is that it should be parsed as follows: proc-id = 123; thread-sep = empty; thread-id = empty However, I think I could also interpreted it as follows: proc-id = 1; thread-sep = 2; thread-id = 3 This is, because thread-sep is VISUAL. If we collapse the ABNF a little, that full-dyn-ip is effectively specified as '[' 1*ALFANUM VISUAL 1*ALFANUM ']' But ALFANUM is effectively a subset of VISUAL, so there is no way to tell where the ALFANUM ends and the VISUAL begins in cases like above. I think this needs to be clarified and changed in the ABNF. I propose that we change the ABNF to replace thread-sep with this definition: thread-sep = ','|'.' As there is no legacy to support, I think this won't hurt. Or is there legacy to support (especially legacy that would not fit into this definition)? And one final comment about colons: we have several places in the ABNF above where we allow colons (path name, thread-sep). Especially in path names (DOS, MAC, VMS), colons seem to be used often. So if we really intend to include path names inside the tag (which sounds like a good idea), we probably need to drop the legacy compatibility rule that a colon NOT followed by a SP will terminate the tag. Look at this: "C:\bin\mwagent.exe[1234]:" This can properly be parsed if we demand a SP after the tag. So we know the first colon is part of the path (because it is not followed by SP) while the last (followed by a SP not shown) is not. With the current wording, that tag would just be parsed as "C:" and the rest goes into the MSG part. I don't see any way to avoid this, instead by making the SP after TAG a MUST. So I think we need to make a tradeoff decision: we can either a) allow colons in the path name xor ;) b) allow TAG NOT to be terminated by SP Selecting a) will break compliance for older clients (how many?) b) will open up a can of (security?) bugs as I guess there are well enough implementors out there not caring about the restricted char Both choices are not really good... I personally have a slight tendency towards a), as I *feel* that the number of affected older senders is limited (but I may be totally wrong). I also think it is "cleaner" from an overall architectural point of view to separate ALL fields by SP and don't make an exception for a single one. An argument for this is, again, this can be the source for some program bugs, eventually even security related ones (missed length check and a long message without spaces immediately following the colon). The above ABNF and wording is inconsistent in regard to the "colon-issue". Well, I think that's it for now. Looks like it gets ugly the more you dig into detail... Rainer