Hi, Since 2000 the need for elementary functions on Unicode strings has been apparent and increasing: - some utility functions exist in GNOME's glib, - clisp, gettext, emacs, python, ... need a programmatic access to the character name tables, - gettext's linebreak module relies on several utility functions for Unicode strings, - any program printing line + column numbers of characters in a file needs to consider the width of the character, e.g. libiconv does this, - clisp would like to have Unicode regular expressions that work even when the locale is in ISO-8859-1 encoding, ...
Since 2001 I've been working on a library covering such topics. But two issues kept me from releasing this library: - It should be more lightweight than IBM's ICU library. It should contain many functions, and support all 3 kinds of in-memory representation (UTF-8, UTF-16 and UTF-32), but without installing a multi-megabyte library. Someone wanting 2 or 3 Unicode string functions does not want to link with a megabyte big library. - The basic character type, ucs4_t, is an alias of uint32_t. But one could not assume <stdint.h>. Gnulib solves both issues 1. by providing an infrastructure for a source-code library, 2. by providing a package independent <stdint.h>. These data types are actually suitable for gnulib, since they are basic and project independent. I'll therefore add a set of modules for Unicode text handling. The choice of the in-memory representation (UTF-8, UTF-16 or UTF-32) is up to the application; libunistring supports all three equally. The modules are organized in the following directories: unistr elementary string functions uniconv conversion from/to legacy encodings unistdio formatted output to strings uniname character names uniwidth string width when using nonproportional fonts unilbrk line breaking algorithm unictype character classification and properties -- unicase case folding unicomp composition and decomposition uniregex regular expressions unibidi bidirectional reordering (use FriBidi in the meantime) The last four are planned, not yet implemented. Copyright is FSF and LGPL, as usual. Bruno