Re: new modules mbs_startswith, mbs_endswith

Simon Josefsson via Gnulib discussion list Fri, 03 Jan 2025 16:42:24 -0800

Bruno Haible via Gnulib discussion list <bug-gnulib@gnu.org> writes:

>> * What happens when strings contain encoding errors? It's not clear from 
>> the spec. I hope behavior isn't simply undefined.
>
> When the str_* functions are used, the byte-wise encoding will matter.
> When the mbiter primitives are used, recall that they cope with
> encoding errors (via the 'bool cur.wc_valid'); thus I expect that
> encoding errors in the range of the suffix will match if it's the
> same encoding error in both argument strings.
>
>> * Can't mbs_endwith be optimized to be just str_endswith in a 
>> single-byte or UTF-8 locale?
>
> Finding out about the encoding of the locale is by itself not cheap
> (I remember from profiling wcwidth, some time ago). But there must be
> a break-even point: for long strings, testing the encoding of the
> locale once and doing the optimized byte-wise code should be a win.
> I hope to find time to benchmark this and find the break-even point.


Would adding mem_startswith (arbitrary char* buffers) and/or
c_startswith (arbitrary NUL-terminated 7-bit strings) make sense?

Unless I'm missing something, the comparison idiom is useful for memory
buffers too, where you don't want locale to influence behaviour.

/Simon

signature.asc
Description: PGP signature

Re: new modules mbs_startswith, mbs_endswith

Reply via email to