On Tue, Dec 03, 2024 at 10:18:46PM +0100, Gioele Barabucci wrote:
> Normalization is always lossy, at least in principle.
> 
> Applications that employ normalization accept that tradeoff in order to gain
> something valuable: in this case the ability to have a Ohm sign codepoint as
> part of your username is traded for the ability to compare usernames across
> different OSes and applications.

I don't know what's exactly in the standard, but my gut feeling says
that I would probably store _exactly_ what was received, but normalize
both sides before duplicate checking, sorting, comparing.

If we'd normalize things away in storage, why do we have homographs in
the first place? Why would I replace a kyrillic a with a latin a,
destroying the idea of a "script"?

Greetings
Marc

-- 
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

Reply via email to