> On Jun 25, 2026, at 05:57, Jeff Davis <[email protected]> wrote:
>
> On Wed, 2026-06-24 at 16:44 +0800, Chao Li wrote:
>> There is a compile warning against pg_wchar.h in 0004:
>
> Fixed. I also used a loop in utf8decode() which is slightly smaller,
> which is good if we intend it to be inlined by a lot of callers.
>
> Regards,
> Jeff Davis
>
> <v5-0001-unicode_case.c-defend-against-invalid-UTF8.patch><v5-0002-pg_unicode_fast-fix-final-sigma-logic.patch><v5-0003-unicode_case.c-change-API-to-signal-UTF8-decoding.patch><v5-0004-Validating-iterator-friendly-UTF8-encoder-decoder.patch><v5-0005-unicode_case.c-use-new-utf8encode-utf8decode-APIs.patch>
I just reviewed v5-0001 and got one concern.
In initcap_wbnext(), the new check only verifies that the input has enough
bytes:
```
if (wbstate->offset + ulen > wbstate->len)
```
What about an invalid continuation byte, for example "\xCE "? In this case,
pg_utf_mblen() sees \xCE, so ulen will be 2. Since there is still one more
byte, the length check won't catch the invalid continuation byte \x20, and the
code will proceed to utf8_to_unicode().
Looking at utf8_to_unicode():
```
static inline char32_t
utf8_to_unicode(const unsigned char *c)
{
if ((*c & 0x80) == 0)
return (char32_t) c[0];
else if ((*c & 0xe0) == 0xc0)
return (char32_t) (((c[0] & 0x1f) << 6) |
(c[1] & 0x3f));
else if ((*c & 0xf0) == 0xe0)
return (char32_t) (((c[0] & 0x0f) << 12) |
((c[1] & 0x3f) << 6) |
(c[2] & 0x3f));
else if ((*c & 0xf8) == 0xf0)
return (char32_t) (((c[0] & 0x07) << 18) |
((c[1] & 0x3f) << 12) |
((c[2] & 0x3f) << 6) |
(c[3] & 0x3f));
else
return PG_INVALID_CODEPOINT;
}
```
For "\xCE ", it will take this branch:
```
else if ((*c & 0xe0) == 0xc0)
return (char32_t) (((c[0] & 0x1f) << 6) |
(c[1] & 0x3f));
```
This uses the second byte, \x20, without validating. So it looks like the patch
prevents reading past the end of the string, but it may not fully defend
against invalid UTF-8 sequences.
Am I missing anything?
(I will continue to review 0002 tomorrow.)
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/