[GENERAL] Unicode + LC_COLLATE

John Sidney-Woollett Thu, 22 Apr 2004 04:34:56 -0700

Priem, Alexander said:
> I recreated my entire database (luckily I keep scripts for
> table/index/view
> creation) and initdb-ed it using --lc-collate=C --encoding=UNICODE. In my
> psqlODBC DSN settings I added "set client_encoding='LATIN9';" to the
> Connect Settings and that solved all my problems regarding the
> special characters.


Does anyone know what the effect of --lc-collate=C --encoding=UNICODE will
be for sorts (and indexes?) when a multibyte unicode character is
encountered?

Is --lc-collate=C --encoding=UNICODE even valid? And if it's valid what
unexpected nasties could it cause?

Is it also true that if LC_COLLATE != 'C' that indexes cannot be used for
LIKE comparisons (and is this also true for en_US.iso885915)?

Our database is UNICODE with LC_COLLATE=en_US.iso885915. Does anyone know
what the effect of someone storing a cyrillic/chinese or korean character
is? (We are using JDBC with a webapp so all the unicode concerns are
handled transparently, apparantly). When the data is extracted from the DB
will it render correctly in the browser provided we send all responses
encoded in UTF-8?

Although http://www.postgresql.org/docs/7.4/interactive/charset.html
describes Postgres specific implementation and "how to configure for" a
given locale - the subtle nuances of combinations of encoding and
LC_COLLATE, and the tradeoffs are not entirely clear (to me at least). For
example are the performance penalties of using UNICODE over ASCII
significant?

Maybe it's just my inexperience but this topic seems to cause lots of
questions. A good/simple technote would be really useful... I'd do one but
I really don't know my ass from my elbow around this topic (and probably
many others too!).

Thanks for any answers/feedback/more info.

John Sidney-Woollett

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

[GENERAL] Unicode + LC_COLLATE

Reply via email to