Re: new modules mbs_startswith, mbs_endswith

Bruno Haible via Gnulib discussion list Fri, 03 Jan 2025 13:09:51 -0800

Paul Eggert wrote:
> * These functions return int 1 or 0. Why not bool? Are you thinking of 
> extending them later? If not, bool seems like the way to go.


I'm not opposed to using 'bool'. But when I saw that no function in
gnulib's <string.h>, <stdlib.h>, or <unistd.h>, so far uses 'bool',
it made me hesitate.

Should we break the "tradition" here to use 'int' for a Boolean value,
and actually use 'bool' for the first time in <string.h>?

> * With these long names, the table in "Comparison of string APIs" no 
> longer lines up.

Yes, in the 'info'-formatted documentation, three @multitable are
partially ugly:
  16.1.5 Comparison of string APIs
  17.12.1 Ordinary container data types
  17.15.2.6 Declarations in <unictype.h>

I don't know what to do about it. Generate an 'info' file with a
page width of 100 instead of 80? Splitting each table into 2 tables,
thus making it harder for the reader to get a synopsis?

> * What happens when strings contain encoding errors? It's not clear from 
> the spec. I hope behavior isn't simply undefined.

When the str_* functions are used, the byte-wise encoding will matter.
When the mbiter primitives are used, recall that they cope with
encoding errors (via the 'bool cur.wc_valid'); thus I expect that
encoding errors in the range of the suffix will match if it's the
same encoding error in both argument strings.

> * Can't mbs_endwith be optimized to be just str_endswith in a 
> single-byte or UTF-8 locale?

Finding out about the encoding of the locale is by itself not cheap
(I remember from profiling wcwidth, some time ago). But there must be
a break-even point: for long strings, testing the encoding of the
locale once and doing the optimized byte-wise code should be a win.
I hope to find time to benchmark this and find the break-even point.

Bruno

Re: new modules mbs_startswith, mbs_endswith

Reply via email to