Excerpts from Joey Adams's message of mar jul 19 21:03:15 -0400 2011: > On Mon, Jul 18, 2011 at 7:36 PM, Florian Pflug <f...@phlo.org> wrote: > > On Jul19, 2011, at 00:17 , Joey Adams wrote: > >> I suppose a simple solution would be to convert all escapes and > >> outright ban escapes of characters not in the database encoding. > > > > +1. Making JSON work like TEXT when it comes to encoding issues > > makes this all much simpler conceptually. It also avoids all kinds > > of weird issues if you extract textual values from a JSON document > > server-side. > > Thanks for the input. I'm leaning in this direction too. However, it > will be a tad tricky to implement the conversions efficiently, since > the wchar API doesn't provide a fast path for individual codepoint > conversion (that I'm aware of), and pg_do_encoding_conversion doesn't > look like a good thing to call lots of times. > > My plan is to scan for escapes of non-ASCII characters, convert them > to UTF-8, and put them in a comma-delimited string like this: > > a,b,c,d, > > then, convert the resulting string to the server encoding (which may > fail, indicating that some codepoint(s) are not present in the > database encoding). After that, read the string and plop the > characters where they go.
Ugh. > It's "clever", but I can't think of a better way to do it with the existing > API. Would it work to have a separate entry point into mbutils.c that lets you cache the conversion proc caller-side? I think the main problem is determining the byte length of each source character beforehand. -- Álvaro Herrera <alvhe...@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers