A locale mapping is basically a lookup table (with complications for things like ß). A single-byte lookup table will be 256 entries, each holding one or more (because of combining characters) Unicode codepoints representing the mapping from the locale character set to the underlying common character set (Unicode). (There may also be a reverse lookup table for mapping Unicode codepoints to locale codepoints.)
Without this, every program would have to deal directly with every possible character set. With it, code can use Unicode internally and let the locale system map to what to display, or in the other direction from what it has read to the common representation. (Complications include things like: depending on encoding/locale details, German lowercase ß will uppercase to either SS or ẞ. And that's one of the simpler ones; for some locales, things can get *really* weird. Not to mention fun stuff like Arabic having 4 representations of every character: initial, medial, final, standalone.) Locale handling is seriously *nasty*. Unicode is also pretty nasty... but it mostly manages the superset of individual locale nastinesses in about as logical a way as possible given that locales are fundamentally illogical: very few of them were designed, most grew organically and without regard for rules or logic. (Esperanto locales being an exception... but even Esperanto has developed some organic extensions with actual usage. It's how humans work.) On Wed, Feb 21, 2018 at 7:08 AM, Eivind Nicolay Evensen < eivi...@terraplane.org> wrote: > On Wed, Feb 21, 2018 at 01:03:01AM -0500, Brandon Allbery wrote: > > On Tue, Feb 20, 2018 at 6:08 PM, Eivind Nicolay Evensen < > > eivi...@terraplane.org> wrote: > > > > > However, since it was mentioned in a note starting with > > > "Add support for unicode collation" I most likely didn't even read it > > > since I'll never touch unicode. > > > > > > > If you ever use anything other than LANG=C, you *are* touching Unicode. > > Well, I don't see multibyte characters with 8859-1, and > multibyte is what I don't tolerate. I didn't even know > that unicode could be single-byte character only sets. > > > > > -- > Eivind > -- brandon s allbery kf8nh sine nomine associates allber...@gmail.com ballb...@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net _______________________________________________ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"