Hi WG, this is an important issue for interoperability between -sign and the upcoming -international. -sign redefines the MSG part of the message as follows:
(@@@ is included to point out the important fragment) ### -sign ### The MSG part contains the details of the message. This has traditionally been a freeform message that gives some detailed information of the event. The MSG part of the syslog packet MUST contain visible (printing) characters. @@@@ The code set used MUST also been seven-bit ASCII in an eight-bit field like that used in the PRI part. @@@@ In this code set, the only allowable characters are the ABNF VCHAR values (%d33-126) and spaces (SP value %d32). Two message types are defined in this document. Each has unique fields within the MSG part and they are described below. ### end -sign ### in contrast, 3164 said: ### 3164 ### The MSG part will fill the remainder of the syslog packet. This will usually contain some additional information of the process that generated the message, and then the text of the message. There is no ending delimiter to this part. The MSG part of the syslog packet MUST contain visible (printing) characters. The code set traditionally and most often used has also been seven-bit ASCII in an eight-bit field like that used in the PRI and HEADER parts. In this code set, the only allowable characters are the ABNF VCHAR values (%d33-126) and spaces (SP value %d32). However, no indication of the code set used within the MSG is required, nor is it expected. @@@@ Other code sets MAY be used as long as the characters used in the MSG are exclusively visible characters and spaces similar to those described above. @@@@ The selection of a code set used in the MSG part SHOULD be made with thoughts of the intended receiver. A message containing characters in a code set that cannot be viewed or understood by a recipient will yield no information of value to an operator or administrator looking at it. ### end 3164 ### The important part is that 3164 says any code set can be used (which as of my understanding actually *would* include UTF-8, whereas -sign says only US-ASCII is allowed. I have to admit that I was a bit -sign focussed when I started the discussion on the issues with using UTF-8 in -international. As of 3164, I think this is allowed even today (but I may still cause some unaware implemntations to fail). If we go ahead and leave -sign with US-ASCII only, it will most probably impossible to use -sign with -international enhanced messages (once -international is out). As such, I strongly propose that -sign is changed back to what 3164 says, so that non-US-ASCII data is allowed to be present. As I interpret 3164 now, this allows 8 bit data to be present inside the MSG part and that in turn allows -international to define UTF-8 with no (well, limited) worries. That would allow me to quickly go to the next revision of -international. Comments? Rainer