Hello,
I would like to fix this bug, but it looks like it would be not one-line patch. Looking at the pg_dump code I see that the object names come through the following chain: 1. pg_dump executes 'SELECT c.tableoid, c.oid, c.relname, ... ' and gets the object_name with the encoding chosen for db connection/dump.
2. it invokes write_msg function or alike:
write_msg(NULL, "finding the columns and types of table \"%s\"\n", tbinfo->dobj.name);
3. vwrite_msg localizes text message, but not the argument(s):
    vfprintf(stderr, _(fmt), ap);
Here gettext (_) internally translates fmt to OS encoding (if it's different from UTF-8 - encoding of a localized strings).

And I can see only a few solutions of the problem:
1. To convert the object name at the back-end, i.e. to modify all the similar SELECT's as: 'SELECT c.tableoid, c.oid, c.relname, convert_to(c.relname, 'OS_ENCODING') AS locrelname, ...' and then do write_msg(NULL, "finding the columns and types of table \"%s\"\n", tbinfo->dobj.local_name); The downside of this approach is that it requires rewriting all the SELECT's for all the object. And it doesn't help us to write out any other text from backend, such as localized backend error.

2. To setup another connection to backend with the OS encoding, and to get all the object names through it. It looks insane too. And we have the same problem with the localized backend errors coming on "main" connection.

3. To make convert_to_os_encoding(text, encoding) function for a frontend utilities. Unfortunately frontend can't use internal PostgreSQL conversion functions, and modifying them to use through libpq looks unfeasible. So the only way to implement such function is to use another encoding conversion framework (library). And my question is - is it possible to include libiconv (add this dependency) to the frontend utilities code?

4. To force users to use OS encoding as the Database encoding. Or to not use non-ASCII characters in an db object names and to disable nls on Windows completely. It doesn't look like a solution at all.

BTW, it's not the only one instance of the issue. For example, when I try to use vacuumdb, I get completely unreadable messages:
http://oi48.tinypic.com/1c8j9.jpg
(blue marks what is in Russian or English, all the other text is gibberish).

Best regards,
Alexander


18.07.2012 12:51, Alexander Law wrote:
Hello,

The dump file itself is correct. The issue is only with the non-ASCII object names in pg_dump messages. The messages text (which is non-ASCII too) displayed consistently with right encoding (i.e. with OS encoding thanks to libintl/gettext), but encoding of db object names depends on the dump encoding and thus they're getting unreadable when different encoding is used. The same can be reproduced in Linux (where console encoding is UTF-8) when doing dump with Windows-1251 or Latin1 (for western european languages).

Thanks,
Alexander


    The following bug has been logged on the website:

    Bug reference:      6742
    Logged by:          Alexander LAW
    Email address:      exclusion(at)gmail(dot)com
    PostgreSQL version: 9.1.4
    Operating system:   Windows
    Description:

    When I try to dump database with UTF-8 encoding in Windows, I get unreadable
    object names.
    Please look at the screenshot (http://oi50.tinypic.com/2lw6ipf.jpg). On the
    left window all the pg_dump messages displayed correctly (except for the
    prompt password (bug #6510)), but the non-ASCII object name is gibberish. On
    the right window (where dump is done with the Windows 1251 encoding (OS
    Encoding for Russian locale)) everything is right.

Did you check the dump file using an editor that can handle UTF-8?
The Windows console is not known for properly handling that encoding.

Thomas





Reply via email to