On Mon, May 30, 2022 at 4:56 AM Laslo Hunhold <d...@frign.de> wrote: > having dove deep into UTF-8 and Unicode, I can at least say that > libutf8proc has an unsafe UTF-8-decoder, as it doesn't catch overlong > encodings. There are also multiple other pitfalls.
Thanks for the reviews. That's really helpful to know. As mentioned, I haven't tried them myself. > I can shamelessly recommend you my UTF-8-codec[0] that's part of my > libgrapheme[1]-library, which also allows you to directly count > grapheme clusters (i.e. visible character units made up of one or more > codepoints). That looks really useful. I noticed the break testing in libgrapheme. Is it possible to use this as a replacement for libunistring? I've been doing some work on Tuxmath. I updated the code from what they'd copied from an old version of gettext to using the libunistring package which was supposed to replace it. Would be nice to offer an alternative to libunistring as another compilation option. They mostly use the code to figure out where to put line breaks when dealing with internationalized text. Thanks again.