On 04/07/2017 05:30 AM, Michael Paquier wrote:
On Fri, Apr 7, 2017 at 2:47 AM, Heikki Linnakangas <hlinn...@iki.fi> wrote:
On 04/06/2017 08:42 PM, Heikki Linnakangas wrote:
There is for example this portion in the new tables:
+static const Codepoint prohibited_output_chars[] =
+{
+   0xD800, 0xF8FF,             /* C.3, C.5 */

   ----- Start Table C.5 -----
   D800-DFFF; [SURROGATE CODES]
   ----- End Table C.5 -----
This indicates a range of values. Wouldn't it be better to split this
table in two, one for the range of codepoints and another one with the
single entries?

I considered that, but there are relatively few singular codepoints in
the tables, so it wouldn't save much space. In this patch, singular
codepoints are represented by a range like "0x3000, 0x3000".

I am really wondering if this should not reflect the real range
reported by the RFC. I understand that you have grouped things to save
a couple of bytes, but that would protect from any updates of the
codepoints within those ranges (unlikely to happen I agree).

It just means that there will be some more work required to apply the changes to the current lists. I constructed the lists manually to begin with, copy-pasting the lists from the RFC, and moving and merging entries by hand. I wouldn't mind doing that by hand again, if the lists change. But as you said, it seems unlikely that they would change any time soon.

You may want to add a .gitignore in src/common/unicode for norm_test
and norm_test_table.h.

Added, and pushed, with some more comment fixes.

Many thanks, Michael!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to