Hi Jean-Micehl, * Jean-Michel POURE <[EMAIL PROTECTED]> [011028 18:23]: > > >psql uses your input literally - so is your console/xterm in > >UNICODE/UTF8? > Client: \encoding returns 'UNICODE'. > Server: \list show databases. All databases are UNICODE (except > TEMPLATE0 and TEMPLATE1 which are ASCII of course). I use a Mandrake > 8.1 distribution and think my console is UNICODE.
I don't know the details for the Mandrake distribution, but I would rather think the default terminal to be iso-8859-15 or iso-8859-1 encoded (I use myself a linux debian sid, customised to be mixed iso-8859-15/utf-8 :) ). In that case, it's likely to cause problems. One thing is to check your current locale (before running psql), by typing "locale charmap" on your terminal : Unicode : asterix:~$ locale charmap UTF-8 latin-9 (fr_FR@euro) : asterix:~$ locale charmap ISO-8859-15 Then, if you really have a Unicode term, then you may run into other problems. Psql uses readline, and readline is not yet "utf-8" enabled by default. There are patches for that, but I don't know why they don't integrate the support into the code... whatever the reason, it means that for example Backspace won't work over characters with more than one byte, and that includes everything which is not ASCII. So, if while typing in psql, you try to do some text editing over the "é", then it's likely to mangle your input to psql (without necessarily be visible in your terminal), and anything from a bad commandline, to psql waiting for more input... When you've finished typing your line, check if psql prompt is displaying an "=" sign : tests=# Third, depending on how your data is entered vs queried, it may have some differences. For example, if you use an application which converts UTF-8 data to D-normalisation before submitting to PostgreSQL, then the "é" will be stored as "e"+"combining mark acute accent". Then, when you do your query, you have to submit in the same format, as "é" (directly typed from the keyboard) and "e"+"comb.acute accent" are two different things (I plan to add support in PostgreSQL for this kind of stuff for 7.3, if I manage to go a bit faster on my other projects...). Anyway, I have been trying a query like yours, using a UTF-8 xterm, with a UNICODE encoding, both psql and database : my table : tests=# insert into matable values ('un texte accentué', 12); INSERT 70197 1 tests=# insert into matable values ('ça accentue le problème', 14); INSERT 70198 1 tests=# select * from matable; montext | valeur -------------------------+-------- un texte accentué | 12 ça accentue le problème | 14 (2 rows) [note that the "é", "ç" and "è" are not combining forms here...] tests=# select * from matable where montext ilike '%accentué%'; montext | valeur -------------------+-------- un texte accentué | 12 (1 row) It works fine for me. > >> As for me, I typed INSERT INTO source_content VALUES ('Permis de > >> conduire accepté') in Psql. > >As I said - psql does not do any conversion. > The faulty query is: INSERT INTO test (source_content) VALUES > ('Permis de conduire accepté'); > > I just can't believe that Psql is not UTF-8 compatible. It seems > unreal as Psql is PostgreSQL #1 helper application. Should I use > PostgreSQL MULE encoding to have automatic trans coding. What are > the guidelines, I am completely lost. Psql is UTF-8 compatible. However, the terminal support of UTF-8 may be a little shaky for now (no dead keys, no compose key) and that will be fixed in Xfree-4.2, and readline support of UTF-8 is deficient (as is bash's, where readline comes from). I don't know when *that* will be fixed. I know http://www.li18nux.org/ has some patches, but I haven't tried them yet. > >> Psql does not insert the data and I have to kill it manually. Can > >> you reproduce this? > >No. If it hangs this is serious problem. Or did you simply forgot > >final ';' ? It btw does not seem valid sql to me, considering you > >previously provided table structure. > Is it possible that my database is corrupted? I have used pg_dump > several times to dump data from production server to development > servers and conversely. Does pg_dump produce UTF8 output? What are > the guidelines when using UTF-8: forget psql and pg_dump? One thing you really have to be careful about is the locale you're running your terminal into (cf above with "locale charmap"). A lot of tools are sensitive to that, as soon as they set the locale, and also the terminal itself is sensitive to that (if you run an xterm, a gnome-terminal or other, make sure they are started themselves with the correct locale, rather than the locale being set by a .bashrc or .profile AFTER the xterm is launched. One way to be sure is to launch an Xterm from the command line in an other xterm ;) ). > >In the end: are the strings/queries you give to psql/pg_exec UTF-8 > >- this is now main thing, as you have _configured_ everything > >correctly. > Everything is configured correctly server-side (PostgreSQL, Psql). > > Thank you very much for your support Marko, > Best regards, > Jean-Michel It's possible to work with psql and UTF-8, I'm using it :) But support for utf-8 is not complete yet, and it's not seamless. Also, support in Postgresql is not yet complete for UTF-8 (normalisation forms, collation, regexes...), but it'll come :) Patrice. -- Patrice Hédé email: patrice hede à islande org www : http://www.islande.org/ ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org