On Tue, Feb 25, 2020 at 1:49 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > > I wrote: > > [ unicode-escapes-with-other-server-encodings-2.patch ] > > I see this patch got sideswiped by the recent refactoring of JSON > lexing. Here's an attempt at fixing it up. Since the frontend > code isn't going to have access to encoding conversion facilities, > this creates a difference between frontend and backend handling > of JSON Unicode escapes, which is mildly annoying but probably > isn't going to bother anyone in the real world. Outside of > jsonapi.c, there are no changes from v2.
With v3, I successfully converted escapes using a database with EUC-KR encoding, from strings, json, and jsonpath expressions. Then I ran a raw parsing microbenchmark with ASCII unicode escapes in UTF-8 to verify no significant regression. I also tried the same with EUC-KR, even though that's not really apples-to-apples since it doesn't work on HEAD. It seems to give the same numbers. (median of 3, done 3 times with postmaster restart in between) master, UTF-8 ascii 1.390s 1.405s 1.406s v3, UTF-8 ascii 1.396s 1.388s 1.390s v3, EUC-KR non-ascii 1.382s 1.401s 1.394s Not this patch's job perhaps, but now that check_unicode_value() only depends on the input, maybe it can be put into pgwchar.h with other static inline helper functions? That test is duplicated in addunicode() and pg_unicode_to_server(). Maybe: static inline bool codepoint_is_valid(pgwchar c) { return (c > 0 && c <= 0x10FFFF); } Maybe Chapman has a use case in mind he can test with? Barring that, the patch seems ready for commit. -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services