On 10/02/2019 12:29, Legale Legage wrote:
This conception can be used for the utf-16 encoding, but table size
would be 65536 bytes against 256 byte for the utf-8 table.

Rather than two 65 kilobyte lookup tables with most entries identical, would it be reasonable to use a bit mask to check for the range we care about?

I may have this slightly wrong, but something like:

#define UTF16_LE_CODE_UNIT_IS_HIGH_SURROGATE (code_unit & 0xFC00 == 0xD800)
#define UTF16_BE_CODE_UNIT_IS_HIGH_SURROGATE (code_unit & 0x00FC == 0x00D8)

m = UTF16_LE_CODE_UNIT_IS_HIGH_SURROGATE(*(uint16_t *)p) ? 4 : 2;

Regards,

--
Rowan Collins
[IMSoP]


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to