Re: [GENERAL] UTF-8 question.

Dan Sugalski Thu, 16 Sep 2004 18:15:49 -0700

At 8:39 PM -0400 9/16/04, Richard Connamacher wrote:

I'm new to PostgreSQL, and from the looks of it, it's a great database,
and I'll be using more of it in the future.

I had a quick question if anyone could clear this up. The documentation
for PostgreSQL (version 7.1, the version this server is using) says that
it supports multibyte character encodings like Unicode (which implies
UTF-16 encoding).

Don't confuse Unicode, the 'character set' and rules for characters, represented by a sequence of abstract 32 bit integers, with UTF-[8|16|32] which is a way to encode those abstract integers into a stream of bytes someplace.

 Later on, the same page says that Unicode is
represented using UTF-8 encoding. UTF-8 is the 8-bit version of Unicode.
The multibyte version of Unicode is UTF-16.

So, which is it? If I create a database using Unicode as the encoding,
will the encoding be UTF-8 (singlebyte) or UTF-16 (multibyte)?

Erm... UTF-8 *is* a multibyte encoding. Up to 6 bytes per code point, if things get really degenerate. (And, last I checked, means you can have up to 70 bytes for really degenerate characters, but my memory might be off (could be 80))

UTF-8, UTF-16, and UTF-32 will all encode Unicode characters just fine.
--
                                Dan

--------------------------------------it's like this-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Re: [GENERAL] UTF-8 question.

Reply via email to