Re: [HACKERS] Proposal - Support for National Characters functionality

Arulappan, Arul Shaji Tue, 30 Jul 2013 00:04:24 -0700

> -----Original Message-----
> From: Tatsuo Ishii [mailto:is...@postgresql.org]
> 
> 
> Also I don't understand why you need UTF-16 support as a database
encoding
> because UTF-8 and UTF-16 are logically equivalent, they are just
different
> represention (encoding) of Unicode. That means if we already support
UTF-8
> (I'm sure we already do), there's no particular reason we need to add
UTF-16
> support.
> 
> Maybe you just want to support UTF-16 as a client encoding?


Given below is a design draft for this functionality:

Core new functionality (new code):
1)Create and register independent NCHAR/NVARCHAR/NTEXT data types.

2)Provide support for the new GUC nchar_collation to provide the
database with information about the default collation that needs to be
used for the new data types.

3)Create encoding conversion subroutines to convert strings between the
database encoding and UTF8 (from national strings to regular strings and
back).
PostgreSQL already have all required support (used for conversion
between the database encoding and client_encoding), so amount of the new
code will be minimal there.

4)Because all symbols from non-UTF8 encodings could be represented as
UTF8 (but the reverse is not true) comparison between N* types and the
regular string types inside database will be performed in UTF8 form. To
achieve this feature the new IMPLICIT casts may need to be created:
NCHAR -> CHAR
NVARCHAR -> VARCHAR
NTEXT -> TEXT.

Casting in the reverse direction will be available too but only as
EXPLICIT.
However, these casts could fail if national strings could not be
represented in the used database encoding.

All these casts will use subroutines created in 3).

Casting/conversion between N* types will follow the same rules/mechanics
as used for casting/conversion between usual (CHAR(N)/VARCHAR(N)/TEXT)
string types.


5)Comparison between NATIONAL string values will be performed via
specialized UTF8 optimized functions (with respect of the
nchar_collation setting).


6)Client input/output of NATIONAL strings - NATIONAL strings will
respect the client_encoding setting, and their values will be
transparently converted to the requested client_encoding before
sending(receiving) to client (the same mechanics as used for usual
string types).
So no mixed encoding in client input/output will be supported/available.

7)Create set of the regression tests for these new data types.


Additional changes:
1)ECPG support for these new types
2) Support in the database drivers for the data types.

Rgds,
Arul Shaji




> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese: http://www.sraoss.co.jp




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal - Support for National Characters functionality

Reply via email to