Hi Tom, Issue is on Windows:
If you see in attached failure.out file, (after running failure.sql) we are getting "ERROR: invalid byte sequence for encoding "UTF8": 0xe59aff" error. Please note that byte sequence we got from database is e5 9a ff, where as actual byte sequence for the wide character '功' is e5 8a 9f. '功' ==> UNICODE Character e5 8a 9f ==> Original Byte Sequence for the given characters e5 9a ff ==> downcase_truncate_identifier() result, which is invalid UTF8 representation, stored in pg_catalog table While displaying on client, we receive this invalid byte sequence which throws an error. Note that UTF8 characters have predefined character ranges for each byte which is checked in pg_utf8_islegal() function. Here is the code snippet: == a = source[2]; if (a < 0x80 || a > 0xBF) return false; == Note that source[2] = ff, which does not fall into the valid range which results in illegal UTF8 character sequence. If you carefully see the original one i.e. 9f, which falls within the range. since we smash the identifier to lower case using downcase_truncate_identifier() function, the solution is to make this function should be wide-char aware, like LOWER() function functionality. I see some discussion related to downcase_truncate_identifier() and wide-char aware function, but seems like we lost somewhere. (http://archives.postgresql.org/pgsql-hackers/2010-11/msg01385.php) This invalid byte sequence issue seems like a more serious issue, because it might lead e.g to pg_dump failures. I have tested this on PG9.0 beta4 (one click installers), BTW, we have observed same with earlier version as well. Attached is the .sql and its output (run on PG9.0 beta4). Any thoughts??? Thanks -- Jeevan B Chalke Senior Software Engineer, R&D EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: +91 20 30589500 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.
SELECT version(); set client_encoding to EUC_CN; SELECT name,setting FROM pg_settings WHERE name like 'lc%' OR name like '%encoding'; create table 加入 ( 用户名 text, 新功能 varchar); insert into 加入 values('- 隐私政策 ',' 使用条款'); insert into 加入 values('计划政策',' 登录到'); select 新功能 from 加入; select * from 加入; drop table 加入 ;
failure.out
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers