> -----Original Message----- > From: Tatsuo Ishii [mailto:is...@postgresql.org] > > > Also I don't understand why you need UTF-16 support as a database encoding > because UTF-8 and UTF-16 are logically equivalent, they are just different > represention (encoding) of Unicode. That means if we already support UTF-8 > (I'm sure we already do), there's no particular reason we need to add UTF-16 > support. > > Maybe you just want to support UTF-16 as a client encoding?
Given below is a design draft for this functionality: Core new functionality (new code): 1)Create and register independent NCHAR/NVARCHAR/NTEXT data types. 2)Provide support for the new GUC nchar_collation to provide the database with information about the default collation that needs to be used for the new data types. 3)Create encoding conversion subroutines to convert strings between the database encoding and UTF8 (from national strings to regular strings and back). PostgreSQL already have all required support (used for conversion between the database encoding and client_encoding), so amount of the new code will be minimal there. 4)Because all symbols from non-UTF8 encodings could be represented as UTF8 (but the reverse is not true) comparison between N* types and the regular string types inside database will be performed in UTF8 form. To achieve this feature the new IMPLICIT casts may need to be created: NCHAR -> CHAR NVARCHAR -> VARCHAR NTEXT -> TEXT. Casting in the reverse direction will be available too but only as EXPLICIT. However, these casts could fail if national strings could not be represented in the used database encoding. All these casts will use subroutines created in 3). Casting/conversion between N* types will follow the same rules/mechanics as used for casting/conversion between usual (CHAR(N)/VARCHAR(N)/TEXT) string types. 5)Comparison between NATIONAL string values will be performed via specialized UTF8 optimized functions (with respect of the nchar_collation setting). 6)Client input/output of NATIONAL strings - NATIONAL strings will respect the client_encoding setting, and their values will be transparently converted to the requested client_encoding before sending(receiving) to client (the same mechanics as used for usual string types). So no mixed encoding in client input/output will be supported/available. 7)Create set of the regression tests for these new data types. Additional changes: 1)ECPG support for these new types 2) Support in the database drivers for the data types. Rgds, Arul Shaji > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers