Re: bug in strcasecmp and strncasecmp

Corinna Vinschen via Cygwin Mon, 17 Feb 2025 02:02:17 -0800

Hi Bruno,

On Feb 16 17:18, Bruno Haible via Cygwin wrote:
> Per POSIX [1], the functions strcasecmp and strncasecmp should
> "use the current locale to determine the case of the characters.".
> 
> [1] https://pubs.opengroup.org/onlinepubs/9799919799/functions/strcasecmp.html
> 
> This is not what Cygwin does: In the fr_FR.ISO8859-1 locale, the
> characters 0xE9 and 0xC9 are the same modulo case, but strcasecmp
> and strncasecmp consider these characters to be different.


Thanks for your report.

This is a longstanding problem in newlib.  All four strcasecmp functions
call tolower on a char without casting them to unsigned.  So tolower is
called with negativ values if the char is not in the ASCII range.

Adding a cast fixes that and I just pushed a matching patch.

I'm just not sure if that's sufficient in the light of POSIX.1-2024.
The above expression seems to indicate that strcasecmp and friends are
now expected to work on multibyte codesets like UTF-8.

I checked the glibc sources and they still do the bytewise tolower twist
as well, though...


Thanks,
Corinna

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Re: bug in strcasecmp and strncasecmp

Reply via email to