Hello,
I am using Postgresql 7.1 on Linux platform (RedHat
7.1).
My database encoding is 'EUC_CN'.
The application is accessing database with PG
JDBC2.0.
I had define a field in a table like:
create table test1 (
id integer default not null,
memo varchar(128)
);
The memo field is for user to record some comment or alike.
They input Chinese (GB2312 or GBK encoding) mixed with ASCII.
Problem happens when:
The length of the input string is larger than 128, and
the 128th and 129th byte consists of a Chinese character (you know Chinese
characters use two bytes in GB2312 or GBK encoding).
The problem is:
The insert query will be running well without any error. But
the getString method will get a zero length String from the field.
More complications:
When I pg_dump the database and restore it, the scripts
produced by pg_dump (with -D flag, which means dump with attribute) can not
be restored. When I check the scripts I found that the memo field of this record
is dumped without the ending single quote (this is because the 128th byte and
the single quote followed acutally consists of another unrecognized chinese
character) and that is why it failed to be restored.
Below is the dump for this record:
INSERT INTO
"test1" ("id","memo") VALUES
(5,'Ò¨°¡Á¡»¡ìͶËßµÄÊÇÒÀɽ¾ÓGHµ¥ÔªÕûÌåÇ彨¤Ö¡ÂÒªÊÇ5ÔÂ1ÈÕÖÁ3ÈÕÒ¨°µÚÒ»¾Ó¿ªÅÌ¡ã²ÅÅÁËһλÐÂÔ¡À¹¡è¶¥ÌæÁ˼¸Ì¨¬£¬ÍϵØÊ¡Àˮû¡Á¢Òâѹ¸ÉÖÂʹҵ֡ÂͶËßÒÑҪǨ®Ë');
I feel the Multibyte is not properly handled in this case.
Looking forward to hearing from dev team.
Finally I think PostgreSQL is an excellent database, but the
name postgresql seems very difficult to pronounce and it is probably one
obstacle preventing people knowing more about it.
Thanks for the hardworking of the dev team, you have done
excellent work!
Best Regards,
|