Re: new modules mbs_startswith, mbs_endswith

2025-01-04 Thread Bruno Haible via Gnulib discussion list
Paul Eggert wrote: > > if (!mbi_avail (iter)) > > -abort (); > > +/* We can get here due to incomplete multibyte characters. > > */ > > +return false; > > mbi_advance (iter); > > If the string ends in an incomplete s

Re: new modules mbs_startswith, mbs_endswith

2025-01-04 Thread Paul Eggert
On 2025-01-04 09:37, Bruno Haible wrote: No, that unfortunate property ... is a problem with the BIG5-HKSCS encoding, not with the GB18030 encoding. (Recall that GB18030 is more-or-less a Unicode transformation format like UTF-8, UTF-16, UTF-32.) See https://www.gnu.org/software/gnulib/manual/ht

Re: new modules mbs_startswith, mbs_endswith

2025-01-04 Thread Paul Eggert
On 2025-01-04 09:25, Bruno Haible wrote: if (!mbi_avail (iter)) -abort (); +/* We can get here due to incomplete multibyte characters. */ +return false; mbi_advance (iter); If the string ends in an incomplete seque

Re: new modules mbs_startswith, mbs_endswith

2025-01-04 Thread Bruno Haible via Gnulib discussion list
Paul Eggert wrote: > Come to think of it, isn't there a different problem with > mbs_startswith? As I recall, mbiter supports GB18030, which has the > unfortunate property that an indivisible sequence of encoding bytes > stands for two Unicode characters No, that unfortunate property (that part

Re: new modules mbs_startswith, mbs_endswith

2025-01-04 Thread Bruno Haible via Gnulib discussion list
Paul Eggert wrote: > >> * What happens when strings contain encoding errors? It's not clear from > >> the spec. I hope behavior isn't simply undefined. > > > > When the str_* functions are used, the byte-wise encoding will matter. > > I thought that str_* functions didn't care about locale, which

Re: new modules mbs_startswith, mbs_endswith

2025-01-04 Thread Paul Eggert
On 2025-01-03 23:39, Paul Eggert wrote: Don't we have problems with mbs_startswith, though? If the prefix ends in an incomplete multibyte character (an encoding error), the current code can match that to part of a multibyte character in the string. This doesn't match what you'd get if you ran m

Re: new modules mbs_startswith, mbs_endswith

2025-01-04 Thread Bruno Haible via Gnulib discussion list
Paul Eggert wrote: > > Should we break the "tradition" here to use 'int' for a Boolean value, > > and actually use 'bool' for the first time in ? > > I would prefer it, if it's a boolean function. Done: 2025-01-04 Bruno Haible *_startswith, *_endswith: Change return type to 'bool'.

Re: new modules mbs_startswith, mbs_endswith

2025-01-04 Thread Bruno Haible via Gnulib discussion list
Hi Simon, > Would adding mem_startswith (arbitrary char* buffers) and/or > c_startswith (arbitrary NUL-terminated 7-bit strings) make sense? c_startswith and str_startswith are the same. mem_startswith and mem_endswith would make sense, if some package needs them. We already have them in the for

Re: new modules mbs_startswith, mbs_endswith

2025-01-03 Thread Paul Eggert
On 2025-01-03 13:09, Bruno Haible wrote: Should we break the "tradition" here to use 'int' for a Boolean value, and actually use 'bool' for the first time in ? I would prefer it, if it's a boolean function. I don't know what to do about it. Generate an 'info' file with a page width of 100 in

Re: new modules mbs_startswith, mbs_endswith

2025-01-03 Thread Simon Josefsson via Gnulib discussion list
Bruno Haible via Gnulib discussion list writes: >> * What happens when strings contain encoding errors? It's not clear from >> the spec. I hope behavior isn't simply undefined. > > When the str_* functions are used, the byte-wise encoding will matter. > When the mbiter primitives are used, recal

Re: new modules mbs_startswith, mbs_endswith

2025-01-03 Thread Collin Funk
Hi Bruno, Bruno Haible via Gnulib discussion list writes: > I'm not opposed to using 'bool'. But when I saw that no function in > gnulib's , , or , so far uses 'bool', > it made me hesitate. > > Should we break the "tradition" here to use 'int' for a Boolean value, > and actually use 'bool' for

Re: new modules mbs_startswith, mbs_endswith

2025-01-03 Thread Bruno Haible via Gnulib discussion list
Paul Eggert wrote: > * These functions return int 1 or 0. Why not bool? Are you thinking of > extending them later? If not, bool seems like the way to go. I'm not opposed to using 'bool'. But when I saw that no function in gnulib's , , or , so far uses 'bool', it made me hesitate. Should we brea

Re: new modules mbs_startswith, mbs_endswith

2025-01-03 Thread Paul Eggert
On 2025-01-03 05:09, Bruno Haible via Gnulib discussion list wrote: When we offer startswith() and endswith() functions for plain unibyte C strings, we need to do the same with multibyte strings as well. Some comments: * The comments in string.in.h should be imperative sentences. E.g., say "R

new modules mbs_startswith, mbs_endswith

2025-01-03 Thread Bruno Haible via Gnulib discussion list
When we offer startswith() and endswith() functions for plain unibyte C strings, we need to do the same with multibyte strings as well. Done as follows: 2025-01-03 Bruno Haible doc: Mention the new modules. * doc/strings.texi (Comparison of string APIs): Add rows for startswit