[BUGS] Multibyte characters handling bug in varchar()

Edward Wed, 10 Jul 2002 05:12:49 -0700

Hello,

I am using Postgresql 7.1 on Linux platform (RedHat 7.1).

My database encoding is 'EUC_CN'.

The application is accessing database with PG JDBC2.0.

I had define a field in a table like:

create table test1 (

id integer default not null,

memo varchar(128)

);

The memo field is for user to record some comment or alike. They input Chinese (GB2312 or GBK encoding) mixed with ASCII.

Problem happens when:

The length of the input string is larger than 128, and the 128th and 129th byte consists of a Chinese character (you know Chinese characters use two bytes in GB2312 or GBK encoding).

The problem is:

The insert query will be running well without any error. But the getString method will get a zero length String from the field.

More complications:

When I pg_dump the database and restore it, the scripts produced by pg_dump (with -D flag, which means dump with attribute) can not be restored. When I check the scripts I found that the memo field of this record is dumped without the ending single quote (this is because the 128th byte and the single quote followed acutally consists of another unrecognized chinese character) and that is why it failed to be restored.

Below is the dump for this record:

INSERT INTO

"test1" ("id","memo") VALUES (5,'Ò辰℅¡»∫Í¶ËßµÄÊÇÒÀÉ½¾ÓGHµ¥ÔªÕûÌåÇå½角Ö¯ÒªÊÇ5ÔÂ1ÈÕÖÁ3ÈÕÒ辰µÚÒ»¾Ó¿ªÅÌ～²ÅÅÁËÒ»Î»ÐÂÔ㊣¹∟¶¥ÌæÁË¼¸Ì足£¬ÍÏµØÊ㊣Ë®Ã»℅¢ÒâÑ¹¸ÉÖÂÊ¹ÒµÖ¯Í¶ËßÒÑÒªÇ車Ë');

I feel the Multibyte is not properly handled in this case. Looking forward to hearing from dev team.

Finally I think PostgreSQL is an excellent database, but the name postgresql seems very difficult to pronounce and it is probably one obstacle preventing people knowing more about it.

Thanks for the hardworking of the dev team, you have done excellent work!

Best Regards,

[BUGS] Multibyte characters handling bug in varchar()

Reply via email to