Re: Unicode escapes with any backend encoding

John Naylor Tue, 03 Mar 2020 21:33:22 -0800

On Tue, Feb 25, 2020 at 1:49 AM Tom Lane <t...@sss.pgh.pa.us> wrote:
>
> I wrote:
> > [ unicode-escapes-with-other-server-encodings-2.patch ]
>
> I see this patch got sideswiped by the recent refactoring of JSON
> lexing.  Here's an attempt at fixing it up.  Since the frontend
> code isn't going to have access to encoding conversion facilities,
> this creates a difference between frontend and backend handling
> of JSON Unicode escapes, which is mildly annoying but probably
> isn't going to bother anyone in the real world.  Outside of
> jsonapi.c, there are no changes from v2.


With v3, I successfully converted escapes using a database with EUC-KR
encoding, from strings, json, and jsonpath expressions.

Then I ran a raw parsing microbenchmark with ASCII unicode escapes in
UTF-8 to verify no significant regression. I also tried the same with
EUC-KR, even though that's not really apples-to-apples since it
doesn't work on HEAD. It seems to give the same numbers. (median of 3,
done 3 times with postmaster restart in between)

master, UTF-8 ascii
1.390s
1.405s
1.406s

v3, UTF-8 ascii
1.396s
1.388s
1.390s

v3, EUC-KR non-ascii
1.382s
1.401s
1.394s

Not this patch's job perhaps, but now that check_unicode_value() only
depends on the input, maybe it can be put into pgwchar.h with other
static inline helper functions? That test is duplicated in
addunicode() and pg_unicode_to_server(). Maybe:

static inline bool
codepoint_is_valid(pgwchar c)
{
   return (c > 0 && c <= 0x10FFFF);
}

Maybe Chapman has a use case in mind he can test with? Barring that,
the patch seems ready for commit.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Unicode escapes with any backend encoding

Reply via email to