On Fri, Dec 06, 2024 at 04:20:42PM +0900, Sutou Kouhei wrote: > (Do you think that this patch is still needed?)
This thread has fallen off my radar, my apologies about that. Yes, I think that this is a good thing to expand these tests. Let's take one step at a time. I have a couple of comments. +-- U+3042 HIRAGANA LETTER A +COPY (SELECT E'\u3042') TO :'utf8_csv' WITH (FORMAT csv, ENCODING 'UTF8'); +COPY test FROM :'utf8_csv' WITH (FORMAT csv, ENCODING 'EUC_JP'); +ERROR: invalid byte sequence for encoding "EUC_JP": 0xe3 0x81 +CONTEXT: COPY test, line 1 +DROP TABLE test; client_encoding would be used by COPY when not specifying ENCODING option. Perhaps more tests should be added with this value specified by a SET client_encoding? Another one would be valid conversions back and forth. For example, I recall that LATIN1 accepts any bytes and can apply a conversion to UTF-8, so we could use it and expand a bit more the proposed tests? Or something like that? This is not going to be portable across the buildfarm. Two reasons are spotted by the CI (there may be others): 1) For Windows, as in the following regression.diffs: COPY (SELECT E'\u3042') TO :'utf8_csv' WITH (FORMAT csv, ENCODING 'UTF8'); +ERROR: character with byte sequence 0xe3 0x81 0x82 in encoding "UTF8" has no equivalent in encoding "WIN1252" 2) Second failure on Linux, with 32-bit builds: COPY (SELECT E'\u3042') TO :'utf8_csv' WITH (FORMAT csv, ENCODING 'UTF8'); +ERROR: conversion between UTF8 and SQL_ASCII is not supported Likely, this should be made conditional, based on the fact that the database needs to be able to support utf8? There are a couple of examples like that in the tree, based on the following SQL trick: SELECT getdatabaseencoding() <> 'UTF8' AS skip_test \gset \if :skip_test \quit \endif This requires an alternate output for the non-utf8 case. -- Michael
signature.asc
Description: PGP signature