Mark Dilger wrote: >>> In particular, in UTF8 land I'd have expected the argument of chr() >>> to be interpreted as a Unicode code point, not as actual UTF8 bytes >>> with a randomly-chosen endianness. >>> >>> Not sure what to do in other multibyte encodings. >> >> "Not sure what to do in other multibyte encodings" was pretty much my
>> rationale for this particular behavior. I standardized on network byte >> order because there are only two endianesses to choose from, and the >> other seems to be a more surprising choice. > > Since chr() is defined in oracle_compat.c, I decided to look > at what Oracle might do. See > http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96540/func tions18a.htm > > It looks to me like they are doing the same thing that I did, > though I don't have Oracle installed anywhere to verify that. > Is there a difference? This is Oracle 10.2.0.3.0 ("latest and greatest") with UTF-8 encoding (actually, Oracle chooses to call this encoding AL32UTF8): SQL> SELECT ASCII('EUR') AS DEC, 2 TO_CHAR(ASCII('EUR'), 'XXXXXX') AS HEX 3 FROM DUAL; DEC HEX ---------- ---------------------------- 14844588 E282AC SQL> SELECT CHR(14844588) AS EURO FROM DUAL; EURO ---- EUR I don't see how endianness enters into this at all - isn't that just the question of how a byte is stored physically? According to RFC 2279, the Euro, Unicode code point 0x20AC = 0010 0000 1010 1100, will be encoded to 1110 0010 1000 0010 1010 1100 = 0xE282AC. IMHO this is the only good and intuitive way for CHR() and ASCII(). Yours, Laurenz Albe ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq