Re: [GENERAL] character 0xe29986 of encoding "UTF8" has no equivalent in "LATIN2"

2009-08-04 Thread Andreas Kalsch
I know what you are talking about, but I am not sure how many websites really check for incoming encoding. Usually you trust that the client will use the same encoding for sending data as the server has sent. (This is what I mean with my simplified chain) It's some extra work to do converting

Re: [GENERAL] character 0xe29986 of encoding "UTF8" has no equivalent in "LATIN2"

2009-08-04 Thread Alban Hertroys
On 4 Aug 2009, at 15:02, Andreas Kalsch wrote: Alban, what I do to simplify the data chain: HTTP encoding > PHP string encoding > client connection > server - all is UTF8. Plus invalid byte check in PHP (or server). You're missing my point. You start dealing with the encoding of the data

Re: [GENERAL] character 0xe29986 of encoding "UTF8" has no equivalent in "LATIN2"

2009-08-04 Thread Andreas Kalsch
Alban, what I do to simplify the data chain: HTTP encoding > PHP string encoding > client connection > server - all is UTF8. Plus invalid byte check in PHP (or server). What I have tested inside Postgres is entering a 3 byte UTF8 character to this function. And I have got an error. This is a

Re: [GENERAL] character 0xe29986 of encoding "UTF8" has no equivalent in "LATIN2"

2009-08-04 Thread Alban Hertroys
On 4 Aug 2009, at 24:57, Andreas Kalsch wrote: I think the real problem is: Where do you lose the original encoding the users input their data with? If you specify that encoding on the connection and send it to a database that can handle UTF-8 then you shouldn't be getting any conversion pr

Re: [GENERAL] character 0xe29986 of encoding "UTF8" has no equivalent in "LATIN2"

2009-08-03 Thread Alvaro Herrera
Andreas Kalsch wrote: > My question again: Is there a native Postgres solution to simplify > characters consistently? It means to completely remove all > diacriticals from Unicode characters. There's a to_ascii() function but it supports a subset of charsets, and IIRC UTF8 is not one of them. Pa

Re: [GENERAL] character 0xe29986 of encoding "UTF8" has no equivalent in "LATIN2"

2009-08-03 Thread Andreas Kalsch
Alban Hertroys schrieb: On 3 Aug 2009, at 20:32, Andreas Kalsch wrote: Problem: Users will enter _any_ characters in my application and an error really doesn't help in this case. I think the real problem is: Where do you lose the original encoding the users input their data with? If you spe

Re: [GENERAL] character 0xe29986 of encoding "UTF8" has no equivalent in "LATIN2"

2009-08-03 Thread Alban Hertroys
On 3 Aug 2009, at 20:32, Andreas Kalsch wrote: Problem: Users will enter _any_ characters in my application and an error really doesn't help in this case. I think the real problem is: Where do you lose the original encoding the users input their data with? If you specify that encoding on t

Re: [GENERAL] character 0xe29986 of encoding "UTF8" has no equivalent in "LATIN2"

2009-08-03 Thread Andreas Kalsch
So there is definitely no way to this natively? Which would be better because this an easy task, which should be part of the main distribution. What is more performant - has anyone made a benchmark? 1) Perl: http://markmail.org/message/2jpp7p26ohreqnsh?q=plperlu+iconv+postgresql&page=1&refer=2

Re: [GENERAL] character 0xe29986 of encoding "UTF8" has no equivalent in "LATIN2"

2009-08-03 Thread Alvaro Herrera
Andreas Kalsch wrote: > The function "convert_to(string text, dest_encoding name)" will > throw an error and so break my application when not supported > characters are included in the unicode string. > So what can I do > - to filter characters out which have no counterpart in the latin codesets >

Re: [GENERAL] character 0xe29986 of encoding "UTF8" has no equivalent in "LATIN2"

2009-08-03 Thread Pavel Stehule
2009/8/3 Andreas Kalsch : > The function "convert_to(string text, dest_encoding name)" will throw an > error and so break my application when not supported characters are included > in the unicode string. > So what can I do > - to filter characters out which have no counterpart in the latin codeset

[GENERAL] character 0xe29986 of encoding "UTF8" has no equivalent in "LATIN2"

2009-08-03 Thread Andreas Kalsch
The function "convert_to(string text, dest_encoding name)" will throw an error and so break my application when not supported characters are included in the unicode string. So what can I do - to filter characters out which have no counterpart in the latin codesets - or to simple ignore wrong cha

Re: [GENERAL] character 0xe29986 of encoding "UTF8" has no equivalent in "LATIN2"

2009-08-02 Thread Sam Mason
On Sun, Aug 02, 2009 at 08:45:52PM +0200, Andreas Kalsch wrote: > Problem: Users will enter _any_ characters in my application and an > error really doesn't help in this case. Then why don't you stop converting to LATIN2? > What I am searching for is a function to undiacritic special letters to

[GENERAL] character 0xe29986 of encoding "UTF8" has no equivalent in "LATIN2"

2009-08-02 Thread Andreas Kalsch
The function "convert_to(string text, dest_encoding name)" will throw an error and so break my application when not supported characters are included in the unicode string. So what can I do - to filter characters out which have no counterpart in the latin codesets - or to simple ignore wrong cha