There is a problem with the way logging.handlers.SysLogHandler works when presented with Unicode messages. According to RFC 5424, Unicode is supposed to be sent encoded as UTF-8 and preceded by a BOM. However, the current handler implementation puts the BOM at the start of the formatted message, and this is wrong in scenarios where you want to put some additional structured data in front of the unstructured message part; the BOM is supposed to go after the structured part (which, therefore, has to be ASCII) and before the unstructured part. In that scenario, the handler's current behaviour does not strictly conform to RFC 5424.
The issue is described in [1]. The BOM was originally added / position changed in response to [2] and [3]. It is not possible to achieve conformance with the current implementation of the handler, unless you subclass the handler and override the whole emit() method. This is not ideal. For 3.3, I will refactor the implementation to expose a method which creates the byte string which is sent over the wire to the syslog daemon. This method can then be overridden for specific use cases where needed. However, for 2.7 and 3.2, removing the BOM insertion would bring the implementation into conformance to the RFC, though the entire message would have to be regarded as just a set of octets. A Unicode message would still be encoded using UTF-8, but the BOM would be left out. I am thinking of removing the BOM insertion in 2.7 and 3.2 - although it is a change in behaviour, the current behaviour does seem broken with regard to RFC 5424 conformance. However, as some might disagree with that assessment and view it as a backwards-incompatible behaviour change, I thought I should post this to get some opinions about whether this change is viewed as objectionable. Regards, Vinay Sajip [1] http://bugs.python.org/issue14452 [2] http://bugs.python.org/issue7077 [3] http://bugs.python.org/issue8795 -- http://mail.python.org/mailman/listinfo/python-list