On Tue, Dec 03, 2024 at 10:18:46PM +0100, Gioele Barabucci wrote: > Normalization is always lossy, at least in principle. > > Applications that employ normalization accept that tradeoff in order to gain > something valuable: in this case the ability to have a Ohm sign codepoint as > part of your username is traded for the ability to compare usernames across > different OSes and applications.
I don't know what's exactly in the standard, but my gut feeling says that I would probably store _exactly_ what was received, but normalize both sides before duplicate checking, sorting, comparing. If we'd normalize things away in storage, why do we have homographs in the first place? Why would I replace a kyrillic a with a latin a, destroying the idea of a "script"? Greetings Marc -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421