RE: syslog-protocol-01 posted & comments

Rainer Gerhards Wed, 04 Feb 2004 07:25:43 -0800

Anton,

I am using your initial message as an additional guide during my edit to
-protocol-02. I will provide some comments on what I did in reply to
your post, so that everyone finds the reasoning. I will eventually leave
some issues open...


> -----Original Message-----
> From: Anton Okmianski [mailto:[EMAIL PROTECTED]

> 2. Section 2 refers to "machines" and "devices" which is
> misleading.  I
> think we need to talk about "applications". After all a sender and
> collector can both be on the same machine.

postponed edit for -03, taken note of it. I agree, but liked to push out
-02 to finish on the format stuff
>
> 3. Section 4. HOSTNAME.  I think "." and "-" characters are allowed in
> FQDN (except no trailing "-") per RFC 1123. Also, the limit of 64
> characters is inappropriate. It should be 255 per same RFC.

changed to 255. I think I am not prohibiting any characters ... but I
might have overlooked the obvious. I'd appreciate if you could direct me
to where I do this ;) (in -02, please)

>
> 4. Section 4. The time-secfrac field should be specified as 1*4DIGIT.
> This is the only number of digits that would be allowed given the 32
> character limit you specified for TIMESTAMP field. This just makes it
> more explicit and actually removes the need to specify the
> length of the
> TIMESTAMP field.

Done. I kept the TIMESTAMP field length restrition in, too. I think it
doesn't hurt but gives the app developer an additional warning.

>
> 5. Section 4. MSG. I think the character set specified here is not
> consistent with specifying that UTF-8 is supported.  UTF-8
> character can
> consist of multiple bytes and each byte can be any 8-byte value. Also
> you refer to "PRINTABLE" in the comment, which is not defined
> anywhere.

done partly ... may needs refinement. Comments welcome.

>
> 6. Section 4.1. PRI field.  First, I support Albert's
> proposal of a new
> format which increases the number of facilities and provides a format
> that is easier to handle.  I just don't know why stop at 999
> facilities
> and not allow say 2bln (signed 32-bit).  An alternative (less optimal
> for performance) is defining a structured content parameter "facility"
> or "channel" and assuming new syslog collectors/relays will use it in
> configurations.

I re-specified it, together with the new header.

>
> 7. Section 4.1. PRI field.  I think naming facilities 16-23 "local" is
> misleading.  In fact, remote logging uses those almost
> exclusively. So,
> how are they local? I'd call them "custom facility 1,2,..."
> or something
> like that.

Facilities are currently not in with a specific semantic. This probably
needs some
more discussion, though. The -02 draft is meant to provoke a little in
this regard ;)

>
> 8. Section 4.1. Note 1.  I think here and in many other places in this
> draft RFC we should avoid using language such as "...have
> been seen...",
> etc.  This is not intended to be an informational RFC like
> 3164. I think
> it would be more appropriate to be talking about what SHOULD or MUST
> happen instead of what has been seen to happen.

done

>
> 9. Section 4.2. Here and thereafter you use the term "visible
> (printing)
> characters".  Although you clarify everywhere the specific character
> range, I think this term is imprecise.  A Chinese character encoded in
> UTF-8 will be visible if you have the right viewer and not visible if
> you don't.  Maybe you should refer to "non-control
> characters" instead.

partly changed in -02.

>
> 10.  Section 4.2 Last 3 sentences. Again you mention "has usually been
> seen".  Do we actually want to recommend the use of one IP or
> the other
> or at least the consistent use of one?

changed to hostname with FQDN

>
> 11. Section 4.2.1. In the note, you mention "single syntax".  In fact,
> use of second fractions is optional. Yes, technically it is one ABNF
> syntax. But then so is the RFC 3339 which you claim provides "multiple
> syntaxes".

changed to "restricted set of syntaxes"

>
> 12. Section 4.2.1. My feeling is we should not support the
> old timestamp
> format in this RFC.  If some collector wants to support it,
> they can be
> both RFC3164 and RFC.new compatible, right?  Why give more
> prominence to
> the old legacy timestamp which we know is bad?
>

removed old timestamp

> 13. Section 4.2.1. Bullet point talking about time-secfrac should
> mention that performance considerations is another condition for the
> recommendation, not just availability of clock accuracy.

added

>
> 14. Section 4.2.2.  Again, I am not convinced that supporting
> the legacy
> of just the hostname instead of FQDN is the good reason to
> have. We may
> still want just the hostname option for local logging though.

removed plain hostname
I think "just hostname" is so troublesome that we should no longer allow
it (now, that we are no longer in the backward compatibility business)

>
> 15. Section 4.2.2.  Do we want to make a recommendation as to what is
> preferred hostname-only or IP?

change to preferred FQDN

>
> 16. Section 4.2.2. Where we mention IPv6 RFC 2373, we should mention
> specifically the section on "Textual representation" of that RFC -
> section 2.

changed

>
> 17. Section 4.2.3.  We never say what the purpose of the TAG field is
> nor give any guidance to what should be put there. This field of the
> syslog specification is, to me, very strange.  I understand
> the legacy,
> but see my concerns about backward-compatibility.  The fact that no
> spaces are allowed is not optimal.  Recommendation of a
> trailing ":" can
> only mislead casual observers into believing it is used as a separator
> character while it is not.  Then, what's the purpose of this
> recommendation?

totally revamped, colon is a regular character now and it is SP
terminated as usual.

>
> 18. Section 4.2.3.  We never explain what's the difference between
> static and dynamic portions of TAG.  The last sentence talks about use
> of "consistent tag value", but I don't understand what it means.
> Consistent between what and what? This needs clarification.

I have to admit I am not sure myself. However, I would like to adress
this in -03, so that -02 can focus on the broader questions of the new
format. I have taken note of this.

>
> 19.  Section 4.3. The phrase "traditionally and most frequently used"
> should be replaces with SHOULD, MUST or RECOMMENDED I think.

removed

>
> 20. Section 4.3. The last two paragraphs are talking about some "code
> sets".  I think if we are talking about UTF-8, we are talking about
> *one* code set -- UNICODE -- and one encoding -- UTF-8.  I thought
> UNICODE and UTF-8 obsoleted all that code set business, or am I wrong?

I guess you are not wrong ;) - changed to UNICODE/UTF-8.

>
> 21. Section 4.4. TRAILER. This says that some receivers may require a
> trailer.  Aren't we supposed to specify here what compatible
> receiver is
> allowed to require and what not? Why are we allowing this?  I think
> nobody should require trailer and we should drop this from format.

created an issue out of this - let's see how further discussion goes

LINK-TO-ISSUE-HERE

>
> 22. Section 4.5. Sentence "..locally defined facility (local4)...".
> Again, I am confused by term "locally-defined".

section 4.5 was not updated - now it is.

>
> 23. Section 5 & beyond. Why is there a need to specified
> structured data
> *anywhere* within the message.  I thought we will designate a special
> field like TAG for the structured data.  This way we won't need a
> special sequence to identify it.  Also, I think allowing it everywhere
> gives too much unnecessary freedom. Harder to evolve protocol later.

I created an issue out of this: Not touched it in -02.

>
> 24. Section 5.1.  Like with the MSG, I think the character set of
> parameter value is any non-control character with some
> characters being
> escaped. We are supporting UTF-8 within the parameter values, right?

I think I have not forbidden that - have I? The ABNF - I think - allows
all
valid printable UTF-8 values. Please correct me if I am wrong.

I would like to address the other "section 5 bullet points" in -03 -
again, I try to focus -02 on a primarily section 4 update, which in
itself is considerable enough. I would like to stabilize this first,
then move on to the other topics.

>
> 25. Section 5.1. I think the fact that each structured data item which
> has a different IANA dictionary needs to be in a different block is
> somewhat cumbersome and limiting. For example, if I want to put the
> msgid parameter in all of my messages regardless of use of
> fragmentation, then when I do use fragmentation, do I have to put this
> parameter twice?
>
> 26. Section 5.1.  I think dictionary identifier can be made
> into a just
> another key-value parameter. This would be more consistent with
> providing a general mechanism key-value pairs and idea of using []
> brackets to group related tags.
>
> 27. Section 5.1. Can the SD-ID be optional for experimental
> parameters.
> This way I don't have to put "x-cisco" in front of all tags.  I don't
> see any value in this.  We can just assume experimental tags.  If a
> given vendor needs to identify his tags they can do this with
> their own
> parameters like "vendor", "product", "version", or whatever else the
> vendor wants.  Vendor tags are for vendor use only, right? General
> syslog collector won't use them anyway, correct?
>
> 28. Section 5.1. I would also consider the following approach which
> eliminates dictionaries.  If we only need parameter namespace
> so we can
> avoid conflicts between current & future syslog RFCs and vendor
> parameters, then we can just define some prefix for current and future
> syslog protocol parameters. For example "sys.msgno", "sys.fragcount",
> etc.  Then, we will control the tags in this namespace using IANA or
> RFCs.  If some vendor wants to re-use the "sys.msgno" tag because the
> definition of the tag suites them for a different use case, then they
> don't need to duplicate it.
>
> 29. Section 5.1. I think we should require a space character
> after each
> structured data block closing bracket.  This will make it
> more readable
> while eliminating the ambiguity as to whether or not the space is part
> of the message.  Even you examples will look nicer. I think
> we can make
> the space optional between two structured blocks of data.
>
> 30. Section 6. Paragraph 3 call for not using fragmentation
> when message
> can fit in a single message.  I think, in general, we assume
> the use of
> fragmentation *only* for splitting long messages.  We had some
> discussion on this a long time ago, but I don't remember the
> conclusion.
> The other use case for fragmentation (or better named multi-part
> messages in this case) is when the message is inherently multi-line.
> For example, a stack trace:
>
> LockConflictException
>  at com.cisco.csrc.db.LockTable.obtainUpdateLock(LockTable.java:199)
>  at
> com.cisco.csrc.db.indexes.OidIndex.obtainUpdateLock(OidIndex.java:448)
>  at
> com.cisco.csrc.db.PObjectImpl.obtainUpdateLock(PObjectImpl.java:1184)
>
> How can I send such message with current draft?  I would have
> to come up
> with some new parameters likely.  I think this needs to be
> standardized.
> The distinction here is that the original message is not a
> single line.
> Rather the original message is a multi-part message with each
> part being
> a separate line.
>
> To handle the above we need to differentiate the case when
> message does
> not need to be assembled.

I created a new issue out of that.

>
> 31. Section 6.  Again we had discussion on this before... It would be
> useful if message parts could be sent before the total length of the
> message is know.  We have one message in our system which is
> about 2000
> lines long. It dumps all kinds of properties on crash.  It
> would be nice
> if I could send parts of this message without knowing the
> total message
> part count.  Otherwise, I would need to assemble the whole message
> before sending it.  This can be problematic if I am crashing
> due to out
> of memory condition, for example.  To address this, we simply need to
> sate that recount parameter is optional in all fragments
> except for the
> last message.  This will designate the end of the fragmented message.

This sounds reasonable. I have created an issue out of that:

>
> 32. Section 6.2.  The above suggestions would mean that you can't sign
> the whole message, only parts.  You suggest that signing all parts is
> not as safe as signing the whole message.  Why?  We know exactly the
> message to which each part belongs and this information is  signed,
> right?

I was thinking about an offline log parser/verifier. At at high level
view, it could happen/be required that the parser app needs to
reassemble the parts before they can be processed. At this stage, the
part signatures get lost. So it would definitely be a plus to have the
full message sigend, too. Of course, this raises the same issue as
#31...

>
> 33. General. What do we do with non-conforming messages.  Do
> we want to
> recommend that collectors/relay agent fire some diagnostic
> message which
> embeds the offending message?

partly addressed. -02 says: log diagnostic, discard (by default) or do
operator-configured action (recommended: treat as 3164).

>
> 34. Do we want to introduce more standard parameters? Good candidates
> are "facility" and "severity".  Yes, this will duplicate information,
> but we can make them optional.  At least this will overcome
> the problem
> of syslog servers only storing the message and not the PRI field which
> leads to then not knowing what facility or severity the message had if
> you store multiple facilities/severities in the same log file.

I am not sure if we are really after this... It requires additional
space in the (signed!) message. I am about to specify that the syslogd
MUST have a way to store raw messages, without any part removed or
changed. This is vital for signed messages - otherwise you are unable to
verify them. So I think this should be a requirement for the application
(or at least a very important implementor's note, if the IETF does not
allow for the other).


Rainer

RE: syslog-protocol-01 posted & comments

Reply via email to