[HACKERS] Chinese in Postgres

2013-08-16 Thread ciifrance...@tiscali.it
Hello all,
before writing this message, I wrote about this in other mailing lists without 
solving my problem.
Maybe some of you can help me.

I have problems with a DB in postgres, when i try to insert Chinese strings in 
UTF-8 format.
If I insert the data using a C++ program I have empty squares, in this format: 
��� (3 empty squares for each chinese ideogram as that is the length in UTF-8)
If the string contains chinese mixed with ASCII, the ASCII is OK but the 
Chinese is broken:
漢語1-3漢語  --> ��1-3��

All the data is read from a binary file. It seems it's read correctly, but 
something happens when the query is executed.
(If the text is in a different language that uses only 2 bytes for each letter, 
I will see only 2 empty squares per character, ex. hebrew, but this is not good 
anyway...)

Strange things:
1. if i insert the record doing a query from command line (putty), the chinese 
text is OK. This problem is only when i insert by the C++ program.
2. I checked the C++ functions involved by creating unitary tests; if i run 
some other tests (on another virtual machine) the text is not damaged.
These strange things are confusing me, but maybe they will be useful 
informations for somebody who had the same problem.

The DB is set for UTF-8
 Name | Owner | Encoding |   Collate   |Ctype| Access privileges
--+---+--+-+-+--
 postgres | pgsql | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
 MyDB | pgsql | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
 template0| pgsql | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
 template1| pgsql | UTF8 | en_US.UTF-8 | en_US.UTF-8 |

Previously I also tried with:

 Name | Owner | Encoding |   Collate   |Ctype| Access privileges
--+---+--+-+-+--
 postgres | pgsql | UTF8 | C   | C   |
 MyDB | pgsql | UTF8 | C   | C   |
...

But the problem was the same.
I know that you would like to see the code, but it's too long (anyway if you 
want i can try to write some lines of code, like connection to Db and so on). I 
don't know if there is some log create by postgres when inserting damaged data, 
sould be useful.

For now, in order to save your time my question is: did anybody of you have the 
same problem?
(and how did you solve it?)

Thanks,
Francesco

Invita i tuoi amici e Tiscali ti premia! Il consiglio di un amico vale più di 
uno spot in TV. Per ogni nuovo abbonato 30 € di premio per te e per lui! Un 
amico al mese e parli e navighi sempre gratis: http://freelosophy.tiscali.it/ 



R: Re: [HACKERS] Chinese in Postgres

2013-08-16 Thread ciifrance...@tiscali.it
Thanks for your answer.
Yes, the client is also UTF8:

MyDB=# show 
client_encoding;
 client_encoding
-
 UTF8
(1 row)


Cheers
Francesco
Messaggio originale
Da: ha...@2ndquadrant.com

Data: 16/08/2013 14.16
A: "ciifrance...@tiscali.it"

Cc: , , 
Ogg: Re: 
[HACKERS] Chinese in Postgres

On 08/16/2013 01:25 PM, 
ciifrance...@tiscali.it wrote:
> Hello all,
> before writing this 
message, I wrote about this in other mailing lists
> without solving my 
problem.
> Maybe some of you can help me.
>
> I have problems with a DB 
in postgres, when i try to insert Chinese
> strings in UTF-8 format.
> 
If I insert the data using a C++ program I have empty squares, in this

> format: ��� (3 empty squares for each chinese ideogram as that is the

> length in UTF-8)
> If the string contains chinese mixed with ASCII, 
the ASCII is OK but
> the Chinese is broken:
> 漢語1-3漢語  --> ��1-
3��
Can you cehck that your client encoding is also UTF8

hannu=# 
show client_encoding ;
 client_encoding
-
 UTF8
(1 row)



Cheers


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, 
Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via 
pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make 
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-
hackers





Invita i tuoi amici e Tiscali ti premia! Il consiglio di un amico vale più di 
uno spot in TV. Per ogni nuovo abbonato 30 € di premio per te e per lui! Un 
amico al mese e parli e navighi sempre gratis: http://freelosophy.tiscali.it/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


R: 回复: [pgsql-zh-general] R: Re: [HACKERS] Chinese in Postgres

2013-08-16 Thread ciifrance...@tiscali.it
[I reply to both in one email]

Song:
that C++ program has a log file. 
In the log file the queries look like this:
UPDATE MY_table SET 
UTF8_field = 
'<8f><20><31><32><33><34><27><20><57><48><45><52><45><20><49><44><20><3d><20><31>


starting from the first chinese letter, all the rest of the query is 
in hexa.
But this is not a problem, because the query is inserted fine 
(excepted the chinese letter). And if i use a hexa converter, i get the 
correct query:

UPDATE MY_table SET UTF8_field = '台 1234' WHERE ID = 1


Hannu:
the length in the database is counting each of the empty 
squares:

 UTF8_field | length 
----+----
 ��� 1234   
|  8

cheers
Francesco

Messaggio originale
Da: 
mark3...@yahoo.cn
Data: 16/08/2013 14.52
A: "ciifrance...@tiscali.it"
, "ha...@2ndquadrant.com"
Cc: "pgsql-hackers@postgresql.org", 
"pgsql-zh-gene...@postgresql.org", 
"pgsql-ru-gene...@postgresql.org"
Ogg: 
回复: [pgsql-zh-general] R: Re: [HACKERS] Chinese in Postgres

maybe your 
C++ program has something (such as charset or configuation) causing 
this strange thing

mark





 发件人: 
"ciifrance...@tiscali.it" 
收件人: 
ha...@2ndquadrant.com 
抄送: pgsql-hackers@postgresql.org; pgsql-zh-
gene...@postgresql.org; pgsql-ru-gene...@postgresql.org 
发送日期: 2013年8月16
日, 星期五, 8:40 下午
主题: [pgsql-zh-general] R: Re: [HACKERS] Chinese in 
Postgres
 

Thanks for your answer.
Yes, the client is also UTF8:


MyDB=# show 
client_encoding;
client_encoding
-
UTF8
(1 
row)


Cheers
Francesco
----Messaggio originale
Da: 
ha...@2ndquadrant.com

Data: 16/08/2013 14.16
A: "ciifrancesco@tiscali.
it"

Cc: , 
, 

Ogg: Re: 
[HACKERS] Chinese in Postgres

On 08/16/2013 01:25 PM, 

ciifrance...@tiscali.it wrote:
> Hello all,
> before writing this 

message, I wrote about this in other mailing lists
> without solving 
my 
problem.
> Maybe some of you can help me.
>
> I have problems with 
a DB 
in postgres, when i try to insert Chinese
> strings in UTF-8 
format.
> 
If I insert the data using a C++ program I have empty 
squares, in this

> format: ��� (3 empty squares for each chinese 
ideogram as that is the

> length in UTF-8)
> If the string contains 
chinese mixed with ASCII, 
the ASCII is OK but
> the Chinese is broken:

> 漢語1-3漢語  --> ��1-
3��
Can you cehck that your client encoding 
is also UTF8

hannu=# 
show client_encoding ;
client_encoding

-
UTF8
(1 row)



Cheers


-- 
Hannu Krosing
PostgreSQL 
Consultant
Performance, 
Scalability and High Availability
2ndQuadrant 
Nordic OÜ



-- 
Sent via 
pgsql-hackers mailing list (pgsql-
hack...@postgresql.org)
To make 
changes to your subscription:
http:
//www.postgresql.org/mailpref/pgsql-
hackers





Invita i tuoi amici e 
Tiscali ti premia! Il consiglio di un amico vale più di uno spot in TV. 
Per ogni nuovo abbonato 30 € di premio per te e per lui! Un amico al 
mese e parli e navighi sempre gratis: http://freelosophy.tiscali.it/



-- 
Sent via pgsql-zh-general mailing list (pgsql-zh-general@postgresql.
org)
To make changes to your subscription:
http://www.postgresql.
org/mailpref/pgsql-zh-general




Invita i tuoi amici e Tiscali ti premia! Il consiglio di un amico vale più di 
uno spot in TV. Per ogni nuovo abbonato 30 € di premio per te e per lui! Un 
amico al mese e parli e navighi sempre gratis: http://freelosophy.tiscali.it/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] R: [pgsql-zh-general] (solved - 谢谢) Chinese in Postgres

2013-08-21 Thread ciifrance...@tiscali.it

Eureka.
The problem is solved.

Bambo:
> I guess you forget to convert 
you string to UTF-8 before insert.

No, too many conversions :)

The 
fact is that in the source code the query was casted to UnicodeString, 
because of previous requirements on the project:

//tmp is a char* and 
contains the query
uniQueryStr = UnicodeString(tmp); //useless

executeMyQuery(uniQueryStr); //this is wrong!!
executeMyQuery(tmp); 
//that's right :). TODO never use UnicodeString anymore...

I already 
tried to avoid that cast, but when the target was compiled on the Linux 
server (that's why i used putty) the executable was not overwritten, 
for some weird reason.
So the executable was not 100% according to the 
source code; I was always used to compile code with Visual studio or 
Eclipse so i wasn't aware of such possibility.
To make it working, I 
just removed all the old files and made a long, full and fresh build.


Sorry for making you loose time.

XieXie / 谢谢,
Francesco


Invita i tuoi amici e Tiscali ti premia! Il consiglio di un amico vale più di 
uno spot in TV. Per ogni nuovo abbonato 30 € di premio per te e per lui! Un 
amico al mese e parli e navighi sempre gratis: http://freelosophy.tiscali.it/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers