Re: [HACKERS] BUG #5661: The character encoding in logfile is confusing.

Craig Ringer Fri, 24 Sep 2010 23:49:21 -0700

On 22/09/2010 9:41 PM, Tom Lane wrote:

Craig Ringer<cr...@postnewspapers.com.au>  writes:

On 22/09/2010 5:45 PM, Peter Eisentraut wrote:

We need to produce the log output in the server encoding, because that's
how we need to send it to the client.

That doesn't mean it can't be recoded for writing to the log file,
though. Perhaps it needs to be. It should be reasonably practical to
detect when the database and log encoding are the same and avoid the
transcoding performance penalty, not that it's big anyway.


We have seen ... and rejected ... such proposals before.  The problem is
that "transcode to some other encoding" is not a simple and guaranteed
error-free operation.  As an example, if you choose to name some table
using a character that doesn't exist in the log encoding, you have just
ensured that no message about that table will ever get to the log.

Well, an arguably reasonable if still suboptimal approach is to mask outcharacters without any representation in the target encoding, replacingthem with a substitute ("?" or whatever). The rest of the log message isstill emitted that way.

Currently, Pg may as well be emitting "!...@#!#!#!@#...@#$" for these logrecords. It's garbage unless the user's editor/log viewer/whateverhappens to use the encoding of that set of messages, turning all theothers into garbage instead. To interpret them, I had to

It's not a big deal with languages that mostly use the 7-bit ascii spacemost encodings share, but for russian, chinese, japanese, thai, thevarious indian languages, etc etc etc it's pretty awful, as seen inMikio's example log files.

Nice way to hide your activities from the DBA ;-)

Emitting messages in the wrong encoding doesn't do the DBA any favourseither. Automated log analysis and reporting will have a hard timedealing with the logs, and the DBA will have to keep on switchingencodings in their editor/viewer to interpret or search the logs.Assuming they know how, and know they need to.

Transcoding also
eats memory, which might be in exceedingly short supply while trying
to report an "out of memory" error; and IIRC there are some other
failure scenarios to be concerned about.

Yep, that's certainly a problem. Pre-transcoding them on backend startisn't particularly desirable (wasted startup time, memory) and neitheris pre-allocating extra memory for use on fatal exit paths.

OTOH, don't the current message translations also cost at least somememory, too?

I don't have a good answer for this issue. Only rather less-than-goodideas like: mmap() a file the postmaster generates that contains variousfatal messages, already in the right encodings/translations, with anoffset table at the front? Icky, but effective and doesn't wasteprecious shared memory or produce new unsharable allocations in thebackends that'll only ever get used when something breaks.


--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG #5661: The character encoding in logfile is confusing.

Reply via email to