Bruno Haible via Gnulib discussion list <bug-gnulib@gnu.org> writes: >> * What happens when strings contain encoding errors? It's not clear from >> the spec. I hope behavior isn't simply undefined. > > When the str_* functions are used, the byte-wise encoding will matter. > When the mbiter primitives are used, recall that they cope with > encoding errors (via the 'bool cur.wc_valid'); thus I expect that > encoding errors in the range of the suffix will match if it's the > same encoding error in both argument strings. > >> * Can't mbs_endwith be optimized to be just str_endswith in a >> single-byte or UTF-8 locale? > > Finding out about the encoding of the locale is by itself not cheap > (I remember from profiling wcwidth, some time ago). But there must be > a break-even point: for long strings, testing the encoding of the > locale once and doing the optimized byte-wise code should be a win. > I hope to find time to benchmark this and find the break-even point.
Would adding mem_startswith (arbitrary char* buffers) and/or c_startswith (arbitrary NUL-terminated 7-bit strings) make sense? Unless I'm missing something, the comparison idiom is useful for memory buffers too, where you don't want locale to influence behaviour. /Simon
signature.asc
Description: PGP signature