On 2023-07-13 08:14, Bruno Haible wrote:

By reading the source code of FreeBSD, NetBSD, OpenBSD, macOS, Solaris,
and so on, I can easily determine
   - which parts of the mbstate_t mbsinit() tests,
   - which parts of the mbstate_t the various functions use.
But in order to understand what interdependencies there are, between
the various mbstate_t fields, and what are the assumed invariants,
I would need to carefully read each of the mentioned files (one per
OS and per locale type).

Yes, and I did that for mbcel - that is, I looked at the source code for every coding system used by mbrtoc32 on NetBSD, OpenBSD, FreeBSD, Darwin, and DragonFly. The analysis was not as hard as one might think, as mbrtoc32 quickly decides whether the state is initial, and mbrtoc32 is all that matters for mbcel.

I doubt whether other primitives like mbrlen would differ, though I did not check this. Also, it's possible I made a mistake in analyzing mbrtoc32, though I hope that's unlikely.


And this would not be future-proof

Yes, but that's true for other optimizations we're making. It's true for the assumption that the first byte of a multibyte character cannot be ASCII, for example.

This part of the BSDish code appears to be reasonably stable, and I doubt whether they'll add new coding systems any time soon. So it should be safe enough.


Reply via email to