On Jul19, 2011, at 00:17 , Joey Adams wrote: > I suppose a simple solution would be to convert all escapes and > outright ban escapes of characters not in the database encoding.
+1. Making JSON work like TEXT when it comes to encoding issues makes this all much simpler conceptually. It also avoids all kinds of weird issues if you extract textual values from a JSON document server-side. If we really need more flexibility than that, we should look at ways to allow different columns to have different encodings. Doing that just for JSON seems wrongs - especially because doesn't really reduce the complexity of the problem, as your examples shows. The essential problem here is, AFAICS, that there's really no sane way to compare strings in two different encodings, unless both encode a subset of unicode only. > This would have the nice property that all strings can be unescaped > server-side. Problem is, what if a browser or other program produces, > say, \u00A0 (NO-BREAK SPACE), and tries to insert it into a database > where the encoding lacks this character? They'll get an error - just as if they had tried to store that same character in a TEXT column. > On the other hand, converting all JSON to UTF-8 would be simpler to > implement. It would probably be more intuitive, too, given that the > JSON RFC says, "JSON text SHALL be encoded in Unicode." Yet, they only I reason I'm aware of for some people to not use UTF-8 as the server encoding is that it's pretty inefficient storage-wise for some scripts (AFAIR some japanese scripts are an example, but I don't remember the details). By making JSON store UTF-8 on-disk always, the JSON type gets less appealing to those people. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers