Noah Misch <n...@leadboat.com> writes: > Pushed. This broke 010_dump_connstr.pl on bowerbird, introducing 'invalid > byte sequence for encoding "UTF8"' errors. That's because log_connections > renders this 010_dump_connstr.pl solution insufficient:
Ugh. > 4. If GetMessageEncoding()==PG_SQL_ASCII, make pgwin32_message_to_UTF16() > return NULL. The caller will always send untranslated bytes to write() or > ReportEventA(). This seems consistent with the SQL_ASCII concept and with > pg_do_encoding_conversion()'s interpretation of SQL_ASCII. > 5. When including a datname or rolname value in a message, hex-escape > non-ASCII bytes. They are byte sequences, not text of known encoding. > This preserves the most information, but it's overkill and ugly in the > probably-common case of one encoding across all databases of a cluster. > I'm inclined to do (1) in back branches and (4) in HEAD only. (If starting > fresh today, I would store the encoding of each rolname and dbname or just use > UTF8 for those particular fields.) Other preferences? I agree that (4) is a fairly reasonable thing to do, and wouldn't mind back-patching that. Taking a wider view, this seems closely related to something I've been thinking about in connection with the recent pg_stat_activity contretemps: that mechanism is also shoving strings across database boundaries without a lot of worry about encodings. Maybe we should try to develop a common solution. One difference from the datname/rolname situation is that for pg_stat_activity we can know the source encoding --- we aren't storing it now, but we easily could. If we're thinking of a future solution only, adding a "name encoding" field to relevant shared catalogs makes sense perhaps. Alternatively, requiring names in shared catalogs to be UTF8 might be a reasonable answer too. In all these cases, throwing an error when we can't translate a character into the destination encoding is not very pleasant. For pg_stat_activity, I was imagining that translating such characters to '?' might be the best answer. I don't know if we can get away with that for the datname/rolname case --- at the very least, it opens problems with apparent duplication of names that should be unique. I don't much like your hex-encoding answer, though; that has its own uniqueness-violation hazards, plus it's ugly. I don't have a strong feeling about what's best. regards, tom lane