On 2025-01-03 13:09, Bruno Haible wrote:
Should we break the "tradition" here to use 'int' for a Boolean value,
and actually use 'bool' for the first time in <string.h>?

I would prefer it, if it's a boolean function.


I don't know what to do about it. Generate an 'info' file with a
page width of 100 instead of 80? Splitting each table into 2 tables,
thus making it harder for the reader to get a synopsis?

Maybe hyphenation? No solution is good here.



* What happens when strings contain encoding errors? It's not clear from
the spec. I hope behavior isn't simply undefined.

When the str_* functions are used, the byte-wise encoding will matter.

I thought that str_* functions didn't care about locale, which means the character encoding does not matter for them.


When the mbiter primitives are used, recall that they cope with
encoding errors (via the 'bool cur.wc_valid'); thus I expect that
encoding errors in the range of the suffix will match if it's the
same encoding error in both argument strings.

Don't we have problems with mbs_startswith, though? If the prefix ends in an incomplete multibyte character (an encoding error), the current code can match that to part of a multibyte character in the string. This doesn't match what you'd get if you ran mbiter on both prefix and string and matched each component you found.

(This brings up a different point, which is that mbs_startswith.c and mbs_endswith.c would need to be ported into the mbcel world if it or its dependencies are used in that world. This isn't urgent of course.)


* Can't mbs_endwith be optimized to be just str_endswith in a
single-byte or UTF-8 locale?

Finding out about the encoding of the locale is by itself not cheap

Yes, that's been a problem for some time. Too bad it's so hard to fix.

Reply via email to