[HACKERS] Re: BUG #6742: pg_dump doesn't convert encoding of DB object names to OS encoding

Alexander Law Wed, 25 Jul 2012 04:55:03 -0700

Hello,

I would like to fix this bug, but it looks like it would be not one-linepatch.Looking at the pg_dump code I see that the object names come through thefollowing chain:1. pg_dump executes 'SELECT c.tableoid, c.oid, c.relname, ... ' and getsthe object_name with the encoding chosen for db connection/dump.

2. it invokes write_msg function or alike:

write_msg(NULL, "finding the columns and types of table \"%s\"\n",tbinfo->dobj.name);

3. vwrite_msg localizes text message, but not the argument(s):
    vfprintf(stderr, _(fmt), ap);

Here gettext (_) internally translates fmt to OS encoding (if it'sdifferent from UTF-8 - encoding of a localized strings).


And I can see only a few solutions of the problem:

1. To convert the object name at the back-end, i.e. to modify all thesimilar SELECT's as:'SELECT c.tableoid, c.oid, c.relname, convert_to(c.relname,'OS_ENCODING') AS locrelname, ...'and then do write_msg(NULL, "finding the columns and types of table\"%s\"\n", tbinfo->dobj.local_name);The downside of this approach is that it requires rewriting all theSELECT's for all the object. And it doesn't help us to write out anyother text from backend, such as localized backend error.

2. To setup another connection to backend with the OS encoding, and toget all the object names through it. It looks insane too. And we havethe same problem with the localized backend errors coming on "main"connection.

3. To make convert_to_os_encoding(text, encoding) function for afrontend utilities. Unfortunately frontend can't use internal PostgreSQLconversion functions, and modifying them to use through libpq looksunfeasible.So the only way to implement such function is to use another encodingconversion framework (library).And my question is - is it possible to include libiconv (add thisdependency) to the frontend utilities code?

4. To force users to use OS encoding as the Database encoding. Or to notuse non-ASCII characters in an db object names and to disable nls onWindows completely. It doesn't look like a solution at all.

BTW, it's not the only one instance of the issue. For example, when Itry to use vacuumdb, I get completely unreadable messages:

http://oi48.tinypic.com/1c8j9.jpg
(blue marks what is in Russian or English, all the other text is gibberish).

Best regards,
Alexander


18.07.2012 12:51, Alexander Law wrote:

Hello,
The dump file itself is correct. The issue is only with the non-ASCIIobject names in pg_dump messages.The messages text (which is non-ASCII too) displayed consistently withright encoding (i.e. with OS encoding thanks to libintl/gettext), butencoding of db object names depends on the dump encoding and thusthey're getting unreadable when different encoding is used.The same can be reproduced in Linux (where console encoding is UTF-8)when doing dump with Windows-1251 or Latin1 (for western europeanlanguages).
Thanks,
Alexander


    The following bug has been logged on the website:

    Bug reference:      6742
    Logged by:          Alexander LAW
    Email address:      exclusion(at)gmail(dot)com
    PostgreSQL version: 9.1.4
    Operating system:   Windows
    Description:

    When I try to dump database with UTF-8 encoding in Windows, I get unreadable
    object names.
    Please look at the screenshot (http://oi50.tinypic.com/2lw6ipf.jpg). On the
    left window all the pg_dump messages displayed correctly (except for the
    prompt password (bug #6510)), but the non-ASCII object name is gibberish. On
    the right window (where dump is done with the Windows 1251 encoding (OS
    Encoding for Russian locale)) everything is right.

Did you check the dump file using an editor that can handle UTF-8?
The Windows console is not known for properly handling that encoding.

Thomas

[HACKERS] Re: BUG #6742: pg_dump doesn't convert encoding of DB object names to OS encoding

Reply via email to