Mark Dilger wrote: >> What I suggest (and what Oracle implements, and isn't CHR() and ASCII() >> partly for Oracle compatibility?) is that CHR() and ASCII() >> convert between a character (in database encoding) and >> that database encoding in numeric form. > > Looking at Oracle documentation, it appears that you get different > behavior from CHR(X [USING NCHAR_CS]) depending on whether you call it
> with the argument USING NCHAR_CS. Oracle 9i and higher have an > additional function called NCHR(X) which is supposed to be the same as > CHR(X USING NCHAR_CS). > > On http://www.oraclehome.co.uk/chr-function.htm it says that "To use > UTF8, you specify using nchar_cs in the argument list". Does this mean > that CHR(X) behaves as Tom Lane wants, and NCHR(X) behaves as Albe > Laurenz wants? Vice versa? That web page is misleading at least, if not downright wrong. It's just that an Oracle database has 2 character sets, a "database character set" and a "national character set", the latter always being a UNICODE encoding (the name "national character set" is somewhat misleading). This baroque concept is from those days when nobody had a UNICODE database, but people still wanted to store characters not supported by the "database character set" - in that case you could define a column to be in the "national character set". CHR(n) and CHR(n USING NCHAR_CS) = NCHR(n) are the same function, only that the first one uses the "database character set" and the latter ones the "national character set". Nowadays this Oracle concept of "national character set" is nearly obsolete, one normally uses a UNICODE "database character set". Oracle has two things to say about CHR(): "For single-byte character sets, if n > 256, then Oracle Database returns the binary equivalent of n mod 256. For multibyte character sets, n must resolve to one entire code point. Invalid code points are not validated, and the result of specifying invalid code points is indeterminate." It seems that Oracle means "encoding" when it says "code point" :^) We should of course reject invalid arguments! I don't know if I like this modulus thing for single byte encodings or not... "Use of the CHR function (either with or without the optional USING NCHAR_CS clause) results in code that is not portable between ASCII- and EBCDIC-based machine architectures." There's one thing that strikes me as weird in your implementation: > pgsql=# select chr(0); > ERROR: character 0x00 of encoding "SQL_ASCII" has no equivalent in "UTF8" 0x00 is a valid UNICODE code point and also a valid UTF-8 character! To me (maybe only to me) CHR() and ASCII() have always had the look and feel of "type casts" between "char" and integer, with all the lack of portability this might imply. Yours, Laurenz Albe ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster