Steven Schlansker <ste...@trumpet.io> writes: > On Aug 19, 2010, at 2:35 PM, Tom Lane wrote: >> I was able to reproduce this on my own Mac. Some tracing shows that the >> problem is that isspace(0x85) returns true when in locale en_US.utf-8. >> This causes array_in to drop the final byte of the array element string, >> thinking that it's insignificant whitespace.
> The 0x85 seems to be the second byte of a multibyte UTF-8 > sequence. Check. > I'm not at all experienced with character encodings so I could > be totally off base, but isn't it wrong to ever call isspace(0x85), > whatever the result may be, given that the actual character is 0xCF85? > (U+03C5, GREEK SMALL LETTER UPSILON) We generally assume that in server-safe encodings, the ctype.h functions will behave sanely on any single-byte value. You can argue the wisdom of that, but deciding to change that policy would be a rather massive code change; I'm not excited about going that direction. >> I believe that you must >> not have produced the data file data.copy on a Mac, or at least not in >> that locale setting, because array_out should have double-quoted the >> array element given that behavior of isspace(). > Correct, it was produced on a Linux machine. That said, the charset > there was also UTF-8. Right ... but you had an isspace function that meets our expectations. > I actually can't reproduce that behavior here: You need a setlocale() call, else the program acts as though it's in C locale regardless of environment. My test case looks like this: $ cat isspace.c #include <stdio.h> #include <ctype.h> #include <locale.h> int main() { int c; setlocale(LC_ALL, ""); for (c = 1; c < 256; c++) { if (isspace(c)) printf("%3o is space\n", c); } return 0; } $ gcc -O -Wall isspace.c $ LANG=C ./a.out 11 is space 12 is space 13 is space 14 is space 15 is space 40 is space $ LANG=en_US.utf-8 ./a.out 11 is space 12 is space 13 is space 14 is space 15 is space 40 is space 205 is space 240 is space $ regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs