On Fri, Dec 06, 2024 at 04:20:42PM +0900, Sutou Kouhei wrote:
> (Do you think that this patch is still needed?)

This thread has fallen off my radar, my apologies about that.

Yes, I think that this is a good thing to expand these tests.  Let's
take one step at a time.  I have a couple of comments.

+-- U+3042 HIRAGANA LETTER A
+COPY (SELECT E'\u3042') TO :'utf8_csv' WITH (FORMAT csv, ENCODING 'UTF8');
+COPY test FROM :'utf8_csv' WITH (FORMAT csv, ENCODING 'EUC_JP');
+ERROR:  invalid byte sequence for encoding "EUC_JP": 0xe3 0x81
+CONTEXT:  COPY test, line 1
+DROP TABLE test;

client_encoding would be used by COPY when not specifying ENCODING
option.  Perhaps more tests should be added with this value specified
by a SET client_encoding?

Another one would be valid conversions back and forth.  For example,
I recall that LATIN1 accepts any bytes and can apply a conversion to
UTF-8, so we could use it and expand a bit more the proposed tests?
Or something like that?

This is not going to be portable across the buildfarm.  Two reasons
are spotted by the CI (there may be others):
1) For Windows, as in the following regression.diffs:
 COPY (SELECT E'\u3042') TO :'utf8_csv' WITH (FORMAT csv, ENCODING 'UTF8');
+ERROR:  character with byte sequence 0xe3 0x81 0x82 in encoding "UTF8" has no 
equivalent in encoding "WIN1252"
2) Second failure on Linux, with 32-bit builds:
 COPY (SELECT E'\u3042') TO :'utf8_csv' WITH (FORMAT csv, ENCODING 'UTF8');
+ERROR:  conversion between UTF8 and SQL_ASCII is not supported

Likely, this should be made conditional, based on the fact that the
database needs to be able to support utf8?  There are a couple of
examples like that in the tree, based on the following SQL trick:
SELECT getdatabaseencoding() <> 'UTF8' AS skip_test \gset
\if :skip_test
\quit
\endif

This requires an alternate output for the non-utf8 case.
--
Michael

Attachment: signature.asc
Description: PGP signature

Reply via email to