>-----Original Message----- >From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers- >ow...@postgresql.org] On Behalf Of MauMau > >Hello, > >I think it would be nice for PostgreSQL to support national character types >largely because it should ease migration from other DBMSs. > >[Reasons why we need NCHAR] >-------------------------------------------------- >1. Invite users of other DBMSs to PostgreSQL. Oracle, SQL Server, MySQL, etc. >all have NCHAR support. PostgreSQL is probably the only database out of major >ones that does not support NCHAR. >Sadly, I've read a report from some Japanese government agency that the number >of MySQL users exceeded that of PostgreSQL here in Japan in 2010 or 2011. I >wouldn't say that is due to NCHAR support, but it might be one reason. I want >PostgreSQL to be more popular and regain those users. > >2. Enhance the "open" image of PostgreSQL by implementing more features of SQL >standard. NCHAR may be a wrong and unnecessary feature of SQL standard now >that we have Unicode support, but it is defined in the standard and widely >implemented. > >3. I have heard that some potential customers didn't adopt PostgreSQL due to >lack of NCHAR support. However, I don't know the exact reason why they need >NCHAR.
The use case we have is for customer(s) who are modernizing their databases on mainframes. These applications are typically written in COBOL which does have extensive support for National Characters. Supporting National Characters as in-built data types in PostgreSQL is, not to exaggerate, an important criteria in their decision to use PostgreSQL or not. (So is Embedded COBOL. But that is a separate issue.) > >4. I guess some users really want to continue to use ShiftJIS or EUC_JP for >database encoding, and use NCHAR for a limited set of columns to store >international text in Unicode: >- to avoid code conversion between the server and the client for performance >- because ShiftJIS and EUC_JP require less amount of storage (2 bytes for most >Kanji) than UTF-8 (3 bytes) This use case is described in chapter 6 of "Oracle >Database Globalization Support Guide". >-------------------------------------------------- > > >I think we need to do the following: > >[Minimum requirements] >-------------------------------------------------- >1. Accept NCHAR/NVARCHAR as data type name and N'...' syntactically. >This is already implemented. PostgreSQL treats NCHAR/NVARCHAR as synonyms for >CHAR/VARCHAR, and ignores N prefix. But this is not documented. > >2. Declare support for national character support in the manual. >1 is not sufficient because users don't want to depend on undocumented >behavior. This is exactly what the TODO item "national character support" >in PostgreSQL TODO wiki is about. > >3. Implement NCHAR/NVARCHAR as distinct data types, not as synonyms so that: >- psql \d can display the user-specified data types. >- pg_dump/pg_dumpall can output NCHAR/NVARCHAR columns as-is, not as >CHAR/VARCHAR. >- To implement additional features for NCHAR/NVARCHAR in the future, as >described below. >-------------------------------------------------- > Agreed. This is our minimum requirement too. Rgds, Arul Shaji > > > >[Optional requirements] >-------------------------------------------------- >1. Implement client driver support, such as: >- NCHAR host variable type (e.g. "NCHAR var_name[12];") in ECPG, as specified >in the SQL standard. >- national character methods (e.g. setNString, getNString, >setNCharacterStream) as specified in JDBC 4.0. >I think at first we can treat these national-character-specific features as the >same as CHAR/VARCHAR. > >2. NCHAR/NVARCHAR columns can be used in non-UTF-8 databases and always contain >Unicode data. >I think it is sufficient at first that NCHAR/NVARCHAR columns can only be used >in UTF-8 databases and they store UTF-8 strings. This allows us to reuse the >input/output/send/recv functions and other infrastructure of CHAR/VARCHAR. >This is a reasonable compromise to avoid duplication and minimize the first >implementation of NCHAR support. > >3. Store strings in UTF-16 encoding in NCHAR/NVARCHAR columns. >Fixed-width encoding may allow faster string manipulation as described in >Oracle's manual. But I'm not sure about this, because UTF-16 is not a real >fixed-width encoding due to supplementary characters. This would definitely be a welcome addition. >-------------------------------------------------- > > >I don't think it is good to implement NCHAR/NVARCHAR types as extensions like >contrib/citext, because NCHAR/NVARCHAR are basic types and need client-side >support. That is, client drivers need to be aware of the fixed NCHAR/NVARCHAR >OID values. > >How do you think we should implement NCHAR support? > >Regards >MauMau > > > >-- >Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make >changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers