Re: Integer overflows in memchr

2024-07-11 Thread Eric Blake
On Wed, Jun 26, 2024 at 05:33:55PM GMT, Paul Eggert wrote:
> On 6/26/24 07:57, Bruno Haible wrote:
> > Po Lu wrote:
> > > I believe that the semantics of the POSIX specification of this GNU
> > > function omit the implied guarantee that strnlen will never examine
> > > bytes beyond the first null byte
> > 
> > There is no such guarantee, not even implied.
> 
> There seems to be some confusion here. Here's what POSIX.1-2024 says:
> 
> "The strnlen() function shall compute the smaller of the number of bytes in
> the array to which s points, not including any terminating NUL character, or
> the value of the maxlen argument. The strnlen() function shall never examine
> more than maxlen bytes of the array pointed to by s."

Revisiting this again after today's Austin Group meeting.  As
mentioned before, the POSIX 2024 wording is known to be buggy
(https://austingroupbugs.net/view.php?id=1834) and we want to
coordinate with the C committee since they are considering
standardizing strnlen().  But based on today's discussion, it was
pointed out that...

> 
> This means it's OK to call strnlen ("", SIZE_MAX), because the second arg of
> strnlen can be larger than the number of bytes in the byte array that the
> first arg points to, so long as that array contains a null byte. strnlen is
> unusual in this sense.
> 
> In contrast, it's not OK to call memchr ("", 0, SIZE_MAX).

...this claim appears to be unsupported.  Both C23 and POSIX 2024
state that memchr() "...shall behave as if it reads the characters
sequentially and stops as soon as a matching character is found",
meaning that this should ALWAYS return the address of "" and not NULL.
The Austin Group is hoping that the next draft of N3252 will add a
similar guarantee of linear access to strnlen("", SIZE_MAX), so that
it will be well-defined to stop early as you desire (rather than the
current POSIX 2024 wording where access beyond the end of the string
but within the size limit is not forbidden, and thereby risks a crash
if size is large enough to cross memory boundaries when coupled with
an implementation that does not behave as-if it reads linearly)


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: lib/fnmatch.c: mistakes `char32_t` rather than `wchar_t` on Windows

2024-07-11 Thread Bruno Haible
YX Hao wrote:
> > `c32*.c` files are newly included, but `uc_is_*` functions are not
> The 2nd error also has gone with a newer gnulib commit 92cdf62b,
> which is merged from the head of wget, using a hack of:
> ```
> sed -i "s/!_GL_SMALL_WCHAR_T/defined _GL_SMALL_WCHAR_T/" lib/fnmatch.c

This workaround of yours disables the support of Unicode characters outside
the BMP (used by Chinese, Emoji, and many other scripts) in fnmatch.

That support is only present through char32_t. That's actually the point of
using char32_t. [1]

Bruno

[1] https://www.gnu.org/software/gnulib/manual/html_node/Characters.html






Re: We can not run gnulib-tool in the MinGW.

2024-07-11 Thread Jeffrey Walton
On Sat, Jul 6, 2024 at 12:40 PM Bruno Haible  wrote:
>
> Paul Eggert wrote:
> > > are we OK to drop the ability to run gnulib-tool on Solaris 10?
> >
> > I'm OK with it. When I deal with Solaris 10 (hey, our department's admin
> > server is still Solaris 10 sparc!) I run gnulib-tool elsewhere and copy
> > the results to Solaris 10. Does anybody do otherwise?
>
> Likewise for me.
>
> > Solaris 10 doesn't have Python 3
>
> Good point. So once we drop the shell-based implementation, Solaris 10
> support will be already gone.
>
> So, let's drop Solaris 10 as platform for running gnulib-tool. Thanks.
>
> It's probably not even worth documenting in section "Target Platforms".

One more datapoint, if needed... GCC 10 dropped support for Solaris
10. See .

Jeff



Re: Integer overflows in memchr

2024-07-11 Thread Paul Eggert

On 7/11/24 17:50, Eric Blake wrote:

In contrast, it's not OK to call memchr ("", 0, SIZE_MAX).

...this claim appears to be unsupported.  Both C23 and POSIX 2024
state that memchr() "...shall behave as if it reads the characters
sequentially and stops as soon as a matching character is found",


Thanks for correcting me. Evidently I was relying on old memory, as that 
phrase was added in C11.


In penance, I'll mention that there should be similar wording for 
strncmp. E.g., strncmp ("a", "b", SIZE_MAX) should be valid, although 
the C and POSIX standards don't make this crystal clear.