Re: [BUGS] BUG #5800: "corrupted" error messages (encoding problem ?)

Craig Ringer Thu, 29 Sep 2011 00:51:05 -0700

First, sorry for the slow reply.

Response inline.


On 09/17/2011 08:34 AM, Tom Lane wrote:

Craig Ringer<ring...@ringerc.id.au>  writes:

On 09/17/2011 05:10 AM, Carlo Curatolo wrote:

Just tried with PG 9.1...same problem...

Yep. There appears to be no interest in fixing this bug. All the
alternatives I proposed were rejected, and there doesn't seem to be any
concern about the issue.

The problem is to find a cure that's not worse than the disease.
I'm not exactly convinced that forcing all log messages into a common
encoding is a better behavior than allowing backends to log in their
native database encoding.

If you do want a common encoding, there's a very easy way to get it, ie,
standardize on one encoding for all your databases.

The postmaster may still emit messages in a different encoding if thesystem encoding is not the same as the standard database encoding chosen.

People who aren't
doing that already probably have good reasons why they want to stay with
the encoding choices they've made; forcing their logs into some other
encoding isn't necessarily going to improve their lives.


I'm not convinced.

Mixing their logs with messages in other encodings makes it *impossible*for most people to read them at all. A file with (say) mixed UTF-8,latin-1 and Shift-JIS is effectively hopelessly corrupted as far as mostpeople are concerned. If lines are differently encoded, the file is atotally mangled mess. Try it and see what I mean. As such, I disagree:forcing all their logs into one encoding WILL improve their lives overthe current situation, and won't affect people whose databases are allalready in the system encoding.

In any case, if the system uses a utf8 encoding and the databases arelatin-1 (for example) the admin might actually prefer to have utf8 logsfor easy reading and processing by system tools, no matter what encodingthe databases are in.

The database encoding is an internal thing. The log encoding is anexternal thing. Writing messages to stdout/stderr in an encoding otherthan that specified by LC_CTYPE and LC_MESSAGES is wrong as it'll causegarbage to be shown on a terminal; so IMO is logging in a differentencoding.

Because there's no standard way to flag a file as having a certainencoding, I contend that the correct default is to write files in thedefault encoding used by the system. That is what programs that consumethe logs will expect. The only other correct alternative would be towrite UTF-8 logs with a BOM that lets programs unamgiguously identifythe encoding. That said, users probably should be able to override thelog file location and encoding so a particular database's logs go to aseparate file in a user-defined encoding and/or override the defaultencoding Pg writes.

... The only valid fixes are to log them to different files (with some
way to identify which encoding is used)


I don't recall having heard any serious discussion of such a design, but
perhaps doing that would satisfy some use-cases.  One idea that comes to
mind is to provide a %-escape for log_filename that expands to the name
of the database encoding (or more likely, some suitable abbrevation).
The logging collector protocol would have to be expanded to include that
information, but that seems do-able.

That'd work, though it doesn't solve the problem for people logging tosyslog or to a single file.

I think Pg should also be able to convert all messages into a commonencoding for logging to a single file and should default to using thesystem encoding as that encoding.

The user could configure a different encoding - for example, they mightwant to force utf-8 logging because their databases may have all sortsof different encodings, but they're logging to syslog so they can'tsplit logs out to different files.

A special log destination encoding name, say "log_encoding = database"could be used to bypass all encoding conversion, retaining the currentbehaviour of logging in whatever encoding the database happens to use.

I'm willing to implement this setup (or try, at least) if you think it'sa reasonable thing to do. I don't know how I'll go with multi-filelogging in log_filename, but I'm pretty sure I can handle the logmessage encoding conversion and associated configuration directives.

There's some overhead to encoding conversion, but it's pretty minimal.It can be avoided entirely by ensuring that your log destinationencoding is the same as your Pg database encoding, which under thisscheme you can do by setting "log_encoding = database" and sticking toone encoding or using multi-file logging.


Reasonable plan?

--
Craig Ringer

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #5800: "corrupted" error messages (encoding problem ?)

Reply via email to