First, sorry for the slow reply.

Response inline.

On 09/17/2011 08:34 AM, Tom Lane wrote:
Craig Ringer<ring...@ringerc.id.au>  writes:
On 09/17/2011 05:10 AM, Carlo Curatolo wrote:
Just tried with PG 9.1...same problem...

Yep. There appears to be no interest in fixing this bug. All the
alternatives I proposed were rejected, and there doesn't seem to be any
concern about the issue.
>
The problem is to find a cure that's not worse than the disease.
I'm not exactly convinced that forcing all log messages into a common
encoding is a better behavior than allowing backends to log in their
native database encoding.
>
If you do want a common encoding, there's a very easy way to get it, ie,
standardize on one encoding for all your databases.

The postmaster may still emit messages in a different encoding if the system encoding is not the same as the standard database encoding chosen.

People who aren't
doing that already probably have good reasons why they want to stay with
the encoding choices they've made; forcing their logs into some other
encoding isn't necessarily going to improve their lives.

I'm not convinced.

Mixing their logs with messages in other encodings makes it *impossible* for most people to read them at all. A file with (say) mixed UTF-8, latin-1 and Shift-JIS is effectively hopelessly corrupted as far as most people are concerned. If lines are differently encoded, the file is a totally mangled mess. Try it and see what I mean. As such, I disagree: forcing all their logs into one encoding WILL improve their lives over the current situation, and won't affect people whose databases are all already in the system encoding.

In any case, if the system uses a utf8 encoding and the databases are latin-1 (for example) the admin might actually prefer to have utf8 logs for easy reading and processing by system tools, no matter what encoding the databases are in.

The database encoding is an internal thing. The log encoding is an external thing. Writing messages to stdout/stderr in an encoding other than that specified by LC_CTYPE and LC_MESSAGES is wrong as it'll cause garbage to be shown on a terminal; so IMO is logging in a different encoding.

Because there's no standard way to flag a file as having a certain encoding, I contend that the correct default is to write files in the default encoding used by the system. That is what programs that consume the logs will expect. The only other correct alternative would be to write UTF-8 logs with a BOM that lets programs unamgiguously identify the encoding. That said, users probably should be able to override the log file location and encoding so a particular database's logs go to a separate file in a user-defined encoding and/or override the default encoding Pg writes.


... The only valid fixes are to log them to different files (with some
way to identify which encoding is used)

I don't recall having heard any serious discussion of such a design, but
perhaps doing that would satisfy some use-cases.  One idea that comes to
mind is to provide a %-escape for log_filename that expands to the name
of the database encoding (or more likely, some suitable abbrevation).
The logging collector protocol would have to be expanded to include that
information, but that seems do-able.

That'd work, though it doesn't solve the problem for people logging to syslog or to a single file.

I think Pg should also be able to convert all messages into a common encoding for logging to a single file and should default to using the system encoding as that encoding.

The user could configure a different encoding - for example, they might want to force utf-8 logging because their databases may have all sorts of different encodings, but they're logging to syslog so they can't split logs out to different files.

A special log destination encoding name, say "log_encoding = database" could be used to bypass all encoding conversion, retaining the current behaviour of logging in whatever encoding the database happens to use.

I'm willing to implement this setup (or try, at least) if you think it's a reasonable thing to do. I don't know how I'll go with multi-file logging in log_filename, but I'm pretty sure I can handle the log message encoding conversion and associated configuration directives.

There's some overhead to encoding conversion, but it's pretty minimal. It can be avoided entirely by ensuring that your log destination encoding is the same as your Pg database encoding, which under this scheme you can do by setting "log_encoding = database" and sticking to one encoding or using multi-file logging.

Reasonable plan?

--
Craig Ringer

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply via email to