Re: [HACKERS] Radix tree for character conversion

Heikki Linnakangas Fri, 17 Mar 2017 04:04:18 -0700

On 03/17/2017 07:19 AM, Kyotaro HORIGUCHI wrote:

At Mon, 13 Mar 2017 21:07:39 +0200, Heikki Linnakangas <hlinn...@iki.fi> wrote in 
<d5b70078-9f57-0f63-3462-1e564a577...@iki.fi>

Hmm. A somewhat different approach might be more suitable for testing
across versions, though. We could modify the perl scripts slightly to
print out SQL statements that exercise every mapping. For every
supported conversion, the SQL script could:


1. create a database in the source encoding.
2. set client_encoding='<target encoding>'
3. SELECT a string that contains every character in the source
encoding.


There are many encodings that can be client-encoding but cannot
be database-encoding.


Good point.

I would like to use convert() function. It can be a large
PL/PgSQL function or a series of "SELECT convert(...)"s. The
latter is doable on-the-fly (by not generating/storing the whole
script).

| -- Test for SJIS->UTF-8 conversion
| ...
| SELECT convert('\0000', 'SJIS', 'UTF-8'); -- results in error
| ...
| SELECT convert('\897e', 'SJIS', 'UTF-8');


Makes sense.

You could then run those SQL statements against old and new server
version, and verify that you get the same results.


Including the result files in the repository will make this easy
but unacceptably bloats. Put mb/Unicode/README.sanity_check?

Yeah, a README with instructions on how to do sounds good. No need toinclude the results in the repository, you can run the script against anolder version when you need something to compare with.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Radix tree for character conversion

Reply via email to