Em seg., 3 de nov. de 2025 às 07:45, Bryan Green <[email protected]> escreveu:
> On 11/2/2025 6:34 PM, Ranier Vilela wrote: > > > > > > Em dom., 2 de nov. de 2025 às 13:22, Bryan Green <[email protected] > > <mailto:[email protected]>> escreveu: > > > > On 11/2/2025 10:09 AM, Bryan Green wrote: > > > On 11/2/2025 8:59 AM, Ranier Vilela wrote: > > >> Hi. > > >> > > >> Per Coverity. > > >> > > >> Coverity raised the follow report: > > >> > > >> CID 1642824: (#1 of 1): Overflowed constant (INTEGER_OVERFLOW) > > >> 37. overflow_const: Expression pattern_len - 1ULL, where > > pattern_len is > > >> known to be equal to 0, underflows the type of pattern_len - > > 1ULL, which > > >> is type unsigned long long. > > >> > > >> This is because the function *pg_mbstrlen* can return zero. > > >> And the comment is clear, *just in case there are MB chars*. > > >> > > >> patch attached. > > >> > > >> best regards, > > >> Ranier Vilela > > > > > > Thanks for finding and patching this. I think you're absolutely > > > right that there's a latent bug here, and your fix is appropriate. > > > However, I wanted to share some thoughts on why this has probably > > > never been reported in the wild, and why we should apply the patch > > > anyway. > > > > > > The crash requires a specific and rather unlikely combination of > > > circumstances: > > > > > > 1. You need a locale where lconv->thousands_sep is present but > > > empty. Most locales either provide a real separator (",", ".", > > > or " ") or return NULL (in which case NUM_prepare_locale's > > > fallback logic provides a default). The comments mention > > > "broken glibc pt_BR" as an example, but even that's been fixed > > > in most glibc versions for years now. > > > > > > 2. You need a format string containing 'G', which requests > > > insertion of the locale's thousands separator. > > > > > > Here's the thing: if your locale has no thousands separator, why > > > would you write a format string asking for one? Consider: > > > > > > -- In a locale with thousands_sep = "," > > > SELECT to_char(1234567.89, '9G999G999.99'); > > > -- Result: "1,234,567.89" > > > > > > -- In a locale with empty thousands_sep > > > SELECT to_char(1234567.89, '9G999G999.99'); > > > -- Result: "1234567.89" (the G's insert nothing) > > > > > > If you're working in a locale that doesn't have a thousands > > > separator, you're most likely culturally aware of that fact, > > > and you simply wouldn't include 'G' in your format strings. > > > It would be like writing code to format something your locale > > > doesn't have a concept of. > > > > > > So I think the bug has survived this long because: > > > a) Very few locales have truly empty thousands_sep strings > > > b) Users of those locales naturally avoid the 'G' format code > > > c) Even accidental use would require the right fill-mode > > > settings > > > > > > That said, in my opinion we should definitely apply your patch, > > > for several reasons: > > > > > > First, it's undefined behavior. Backing up the pointer via > > > "ptr += -1" when ptr is at the start of allocated memory is a > > > serious bug, regardless of how unlikely the trigger condition is. > > > We don't want that lurking in the codebase. > > > > > > Second, even if this hasn't been triggered in 20+ years, it's > > > exactly the kind of thing that AFL or similar tools might find. > > > Better to fix it before we get a CVE filing. > > > > > > Third, it's a trivial fix. The performance impact is nil, and it > > > makes the code more obviously correct. There's really no > > > downside. > > > > > > I'd suggest one minor adjustment to your patch: add a comment > > > explaining why we're checking for empty pattern, since it's > > > non-obvious: > > > > > > /* > > > * Guard against empty separator (could happen with some > > > * locales) > > > */ > > > if (pattern_len > 0) > > > { > > > ... > > > } > > > > > > That'll help future maintainers understand this isn't just > > > defensive programming paranoia. > > > > > > Good catch on finding this. We'll have to see if others agree... > > > > > > Bryan Green > > I did a bit more analysis to double check myself and realized I > missed > > an important check: > > > > NUM_prepare_locale() has a safety check: > > > > if (lconv->thousands_sep && *lconv->thousands_sep) > > Np->L_thousands_sep = lconv->thousands_sep; > > else if (strcmp(Np->decimal, ",") != 0) > > Np->L_thousands_sep = ","; > > else > > Np->L_thousands_sep = "."; > > > > The "*lconv->thousands_sep" test specifically prevents empty > > strings from being used, so Np->L_thousands_sep can never be empty in > > production. > > > > The question is that the *pattern* is not empty. > > But that can not contain MB characters. > > In this case, not contain any MB chars, pg_mbstrlen > > will return zero and overflow can happen. > > > > best regards, > > Ranier Vilela > > You're right, and I was wrong. I got focused on whether the string could > be empty at the byte level and completely missed your actual point. > > pg_mbstrlen() returns character count, not byte count. So even though > NUM_prepare_locale() ensures L_thousands_sep isn't an empty string, if > those bytes aren't valid in the database encoding, pg_mbstrlen() will > return 0. Then we do: > > Np->inout_p += pattern_len - 1; > > which is undefined behavior when pattern_len is 0. > > I tried to reproduce this with various locale configurations and couldn't > trigger it. Tested Portuguese_Brazil, German_Germany, different encoding > combinations - everything worked fine. Modern locales use ASCII-safe > separators like "," or "." that work everywhere. You'd need a genuinely > broken locale to get bytes that are invalid UTF-8. > Yeah. Or a broken locale maliciously done. > > Still, undefined behavior is undefined behavior. The fix is simple, > makes the code more robust, quietens Coverity. I think we > should apply it on principle even if nobody's likely to hit it in > practice. If someone's got locale data that broken, they've got bigger > problems anyway. > > I'd call this a code hardening fix rather than an urgent bug. Not > something that needs immediate backpatching, but reasonable to include > in normal maintenance releases? > I agree that backpatching is not necessary. Thank you for your attention and details. best regards, Ranier Vilela
