Hi WG,

this is an important issue for interoperability between -sign and the
upcoming -international. -sign redefines the MSG part of the message as
follows:

(@@@ is included to point out the important fragment)

### -sign ###
 The MSG part contains the details of the message. This has
   traditionally been a freeform message that gives some detailed
   information of the event. The MSG part of the syslog packet MUST
   contain visible (printing) characters.

   @@@@
   The code set used MUST also
   been seven-bit ASCII in an eight-bit field like that used in the PRI
   part.
   @@@@

   In this code set, the only allowable characters are the ABNF
   VCHAR values (%d33-126) and spaces (SP value %d32). Two message types
   are defined in this document. Each has unique fields within the MSG
   part and they are described below.
### end -sign ###

in contrast, 3164 said:

### 3164 ###
   The MSG part will fill the remainder of the syslog packet.  This will
   usually contain some additional information of the process that
   generated the message, and then the text of the message.  There is no
   ending delimiter to this part.  The MSG part of the syslog packet
   MUST contain visible (printing) characters.  The code set
   traditionally and most often used has also been seven-bit ASCII in an
   eight-bit field like that used in the PRI and HEADER parts.  In this
   code set, the only allowable characters are the ABNF VCHAR values
   (%d33-126) and spaces (SP value %d32).  However, no indication of the
   code set used within the MSG is required, nor is it expected.

   @@@@
   Other
   code sets MAY be used as long as the characters used in the MSG are
   exclusively visible characters and spaces similar to those described
   above.
   @@@@

   The selection of a code set used in the MSG part SHOULD be
   made with thoughts of the intended receiver.  A message containing
   characters in a code set that cannot be viewed or understood by a
   recipient will yield no information of value to an operator or
   administrator looking at it.
### end 3164 ###

The important part is that 3164 says any code set can be used (which as
of my understanding actually *would* include UTF-8, whereas -sign says
only US-ASCII is allowed.

I have to admit that I was a bit -sign focussed when I started the
discussion on the issues with using UTF-8 in -international. As of 3164,
I think this is allowed even today (but I may still cause some unaware
implemntations to fail).

If we go ahead and leave -sign with US-ASCII only, it will most probably
impossible to use -sign with -international enhanced messages (once
-international is out).

As such, I strongly propose that -sign is changed back to what 3164
says, so that non-US-ASCII data is allowed to be present.

As I interpret 3164 now, this allows 8 bit data to be present inside the
MSG part and that in turn allows -international to define UTF-8 with no
(well, limited) worries. That would allow me to quickly go to the next
revision of -international.

Comments?

Rainer


Reply via email to